Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › C++ get individual words from a string
New Posts  All Forums:Forum Nav:

C++ get individual words from a string

post #1 of 12
Thread Starter 
Hi,

I have to get individual words from a huge string (actually lyrics to a song). I think I vaguely remember something using cstring functions to get characters until I hit a blank character/newline, but I can't recall exactly. I also recall that I could use stringstreams for the same solution.

Can anyone help me with this?
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
post #2 of 12
If you're using cstring you can use strtok. Not sure about a C++ equivalent, but the concept you're looking for is tokenizing strings. I'm sure punching "C++ tokenize string" into google will get you the answer you need
It goes to eleven
(13 items)
 
  
CPUMotherboardGraphicsRAM
E6300 DS3 EVGA 8600GTS 2GB XMS2 DDR2-800 
Hard DriveOSMonitorKeyboard
1.294 TB Arch Linux/XP Samsung 226bw Eclipse II 
PowerCaseMouse
Corsair 520HX Lian-Li v1000B Plus G7 
  hide details  
Reply
It goes to eleven
(13 items)
 
  
CPUMotherboardGraphicsRAM
E6300 DS3 EVGA 8600GTS 2GB XMS2 DDR2-800 
Hard DriveOSMonitorKeyboard
1.294 TB Arch Linux/XP Samsung 226bw Eclipse II 
PowerCaseMouse
Corsair 520HX Lian-Li v1000B Plus G7 
  hide details  
Reply
post #3 of 12
Tokenizing is truncating strings using delimiters.

Can you give me an example of what you're trying to accomplish? I'm not sure if you're trying to tokenize or not, your OP wasn't very clear.
Intellect v2
(9 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7-6700K Processor ASUS ROG MAXIMUS VIII HERO LGA1151 DDR4 M.2 SAT... EVGA GTX 1080 SC ACX 3.0 Crucial Ballistix Sport 32GB DDR4 2400 MT/s (PC... 
CoolingKeyboardPowerCase
Noctua NH-D15 Das Keyboard 4 Professional (Brown) Corsair AX860 Fractal Design Define R5 
Mouse
MIONIX NAOS 7000 
  hide details  
Reply
Intellect v2
(9 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7-6700K Processor ASUS ROG MAXIMUS VIII HERO LGA1151 DDR4 M.2 SAT... EVGA GTX 1080 SC ACX 3.0 Crucial Ballistix Sport 32GB DDR4 2400 MT/s (PC... 
CoolingKeyboardPowerCase
Noctua NH-D15 Das Keyboard 4 Professional (Brown) Corsair AX860 Fractal Design Define R5 
Mouse
MIONIX NAOS 7000 
  hide details  
Reply
post #4 of 12
Stringstreams should work. Use an istringstream and a string. Example:
Code:

int main(){

  istringstream ss;
  string s;

  while(ss >> s){
      do stuff with your string (individual words) here;
  }

  return 0;
}
Sulaco
(14 items)
 
 
MacBook Pro
(4 items)
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T Asus Crosshair IV Formula Sapphire 7950 3GB 2x2GB Mushkin Enhanced Blackline  
Hard DriveOSMonitorPower
2x150GB Velociraptor RAID 0 | 2x1TB Hitachi Windows 8 Asus VH242H OCZ ModXStream 700W 
Case
Cooler Master HAF 932 
CPUGraphicsOSMonitor
Core 2 Duo P8400 GeForce 9400M OSX Yosemite 13.3" LED-backlit 
  hide details  
Reply
Sulaco
(14 items)
 
 
MacBook Pro
(4 items)
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T Asus Crosshair IV Formula Sapphire 7950 3GB 2x2GB Mushkin Enhanced Blackline  
Hard DriveOSMonitorPower
2x150GB Velociraptor RAID 0 | 2x1TB Hitachi Windows 8 Asus VH242H OCZ ModXStream 700W 
Case
Cooler Master HAF 932 
CPUGraphicsOSMonitor
Core 2 Duo P8400 GeForce 9400M OSX Yosemite 13.3" LED-backlit 
  hide details  
Reply
post #5 of 12
Thread Starter 
Sounds like strtok is what I need. Thanks!

Plex: Here is an example.

This is the lyrics of a song
aren't they great?

I need: this, is, the, lyrics, of, a, song, aren't, they, great

Actually, I just realized this is a problem. How do I make sure all of the character cases are the same? (i.e. This->this) I'm sure a quick google search will do. Don't worry about it: consider the case closed unless I post otherwise.

Thanks everyone!
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
post #6 of 12
strcmp and variants
Akiyama Mio
(13 items)
 
  
CPUMotherboardGraphicsRAM
E6420 @ stock, 0.98v Asus P5N-E SLI Gainward GTX 460 1GB @ 800/1600/1900 2x2GB Kingston @ 800MHz 5-5-5-15 2T 
Hard DriveOptical DriveOSMonitor
WD 250GB, 320GB SATA/3, 16MB Cache, Seagate 1TB LG GSA-H62N 18x SATA Ubuntu 9.10 x86 & Win7 x86 Asus VW222U 
KeyboardPowerCase
Logitech Classic Corsair 650HX NZXT Apollo Black 
  hide details  
Reply
Akiyama Mio
(13 items)
 
  
CPUMotherboardGraphicsRAM
E6420 @ stock, 0.98v Asus P5N-E SLI Gainward GTX 460 1GB @ 800/1600/1900 2x2GB Kingston @ 800MHz 5-5-5-15 2T 
Hard DriveOptical DriveOSMonitor
WD 250GB, 320GB SATA/3, 16MB Cache, Seagate 1TB LG GSA-H62N 18x SATA Ubuntu 9.10 x86 & Win7 x86 Asus VW222U 
KeyboardPowerCase
Logitech Classic Corsair 650HX NZXT Apollo Black 
  hide details  
Reply
post #7 of 12
OP, are you using string objects (the C++ thing to do) or are you strings simply char arrays (aka cstrings)?

If they are string objects (probable) then you could use strtok as follows, assuming the input string is named 'input':

Code:
string words[350]; //max of 350 words
int index;

index = 0;
words[index] = strtok(input.c_str," ,.-");
while(words[index] != NULL)
{
   words[index] = strtok(NULL," ,.-");
   index++;
}


Its very crude, but it should store each work in the string array (as long as there are less than 350).
Scream Machine
(9 items)
 
  
CPUMotherboardGraphicsRAM
i7-4770K Gigabyte Z87X-UD3H EVGA GTX 780 16GB DDR3 
Hard DriveCoolingOSMonitor
256GB Samsung 840 Pro Kraken X60 Windows 7 Shimian 2560x1440 
Case
Phantom 630 
  hide details  
Reply
Scream Machine
(9 items)
 
  
CPUMotherboardGraphicsRAM
i7-4770K Gigabyte Z87X-UD3H EVGA GTX 780 16GB DDR3 
Hard DriveCoolingOSMonitor
256GB Samsung 840 Pro Kraken X60 Windows 7 Shimian 2560x1440 
Case
Phantom 630 
  hide details  
Reply
post #8 of 12
Quote:
Originally Posted by flushentitypacket;13222821 
Sounds like strtok is what I need. Thanks!

Plex: Here is an example.

This is the lyrics of a song
aren't they great?

I need: this, is, the, lyrics, of, a, song, aren't, they, great

Actually, I just realized this is a problem. How do I make sure all of the character cases are the same? (i.e. This->this) I'm sure a quick google search will do. Don't worry about it: consider the case closed unless I post otherwise.

Thanks everyone!


Splitting by whitespace is super easy, stringstreams can do all that legwork.
Code:
string song("This is the lyrics of a song aren't they great?");
string songBuf;
stringstream ssSong(song);
vector<string> lyrics;

while (ssSong >> songBuf)
tokens.push_back(songBuf);

The stringstream basically shoots bits back and forth every time it hits a space, and then dumps just the word back into the vector.

As far as the case goes, C++ has built-in functions for that-- toupper() and tolower(). You can use these to make sure it's all lower case before or after the transformation above. I would recommend before.
Edited by Plex - 4/22/11 at 7:07am
Intellect v2
(9 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7-6700K Processor ASUS ROG MAXIMUS VIII HERO LGA1151 DDR4 M.2 SAT... EVGA GTX 1080 SC ACX 3.0 Crucial Ballistix Sport 32GB DDR4 2400 MT/s (PC... 
CoolingKeyboardPowerCase
Noctua NH-D15 Das Keyboard 4 Professional (Brown) Corsair AX860 Fractal Design Define R5 
Mouse
MIONIX NAOS 7000 
  hide details  
Reply
Intellect v2
(9 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7-6700K Processor ASUS ROG MAXIMUS VIII HERO LGA1151 DDR4 M.2 SAT... EVGA GTX 1080 SC ACX 3.0 Crucial Ballistix Sport 32GB DDR4 2400 MT/s (PC... 
CoolingKeyboardPowerCase
Noctua NH-D15 Das Keyboard 4 Professional (Brown) Corsair AX860 Fractal Design Define R5 
Mouse
MIONIX NAOS 7000 
  hide details  
Reply
post #9 of 12
Thread Starter 
I've decided to implement using cstrings.

I'm converting from a string to cstring, but I don't understand why the sample code from the C++ Reference makes an array of str.size()+1 instead of just str.size. Anyone know why?

// strings and c-strings
#include
#include
#include
using namespace std;

int main ()
{
char * cstr, *p;

string str ("Please split this phrase into tokens");

cstr = new char [str.size()+1];
strcpy (cstr, str.c_str());

// cstr now contains a c-string copy of str

p=strtok (cstr," ");
while (p!=NULL)
{
cout << p << endl;
p=strtok(NULL," ");
}

delete[] cstr;
return 0;
}
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
post #10 of 12
Thread Starter 
..
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
Current
(12 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k ASRock Z68 Extreme4 Gen3 EVGA GTX 570 Classified 4x4GB Corsair XMS3 
Hard DriveCoolingPowerCase
Samsung 830 Thermaltake Frio OCZ ZS 850W Lian Li Lancool First Knight K9 
MouseAudioOtherOther
Logitech G500 Audinst HUD-MX1 Sennheiser HD598 Swan D1080MkII 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Coding and Programming
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › C++ get individual words from a string