Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Web Coding › Web Spider without database?
New Posts  All Forums:Forum Nav:

Web Spider without database?

post #1 of 2
Thread Starter 
I have a design idea for a spider/crawler that will only search X sites. I was curious what is suggested to go about completing this side project. I have searched several tutorials but all seem to have a database. I am learning PHP and would like a crack at some PHP but I wanted to know what some of you think or suggest.
OverKill
(18 items)
 
  
CPUMotherboardGraphicsGraphics
AMD Phenom II X6 1055T Gigabyte 890FXA ATI FirePro V3800 ATI FirePro V3800 
RAMHard DriveHard DriveOptical Drive
16gb GSkill 10666 Seagate Western Digital  LG Blue Ray 
CoolingOSMonitorMonitor
stock air 7 Pro 64 bit (2)22" Acer x223w (2)23" Acer G235H 
KeyboardPowerCaseMouse
Microsoft Wireless Antec 900watt Antec 300 Wacom Tablet 
Audio
Logitech Speakers and sub 
  hide details  
Reply
OverKill
(18 items)
 
  
CPUMotherboardGraphicsGraphics
AMD Phenom II X6 1055T Gigabyte 890FXA ATI FirePro V3800 ATI FirePro V3800 
RAMHard DriveHard DriveOptical Drive
16gb GSkill 10666 Seagate Western Digital  LG Blue Ray 
CoolingOSMonitorMonitor
stock air 7 Pro 64 bit (2)22" Acer x223w (2)23" Acer G235H 
KeyboardPowerCaseMouse
Microsoft Wireless Antec 900watt Antec 300 Wacom Tablet 
Audio
Logitech Speakers and sub 
  hide details  
Reply
post #2 of 2
Some form of "database" is required, be it file, relational or flat. If there was no database you would be forever stuck on the same site as it wouldnt know it already indexed all the pages and every link on every page is brand new.

For ease of use I would recommend a SQL database. It's not the best for performance but has the most turtorials available. This is good for "small" search engines. Think <500m with no after-index analytics.
If you want the highest performance (and your competent in all things technical) then using something like Cassandra or Mongo would be good However, unless you plan to get serious this would be more hassle than worth. You would need to be running a gbps server with heavy analytics and tons of information being stored and analyzed. >500m pages indexed + 20+ gigabytes written daily compressed.
Dink
(20 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core I5 3570K Gigabyte UD3H Sapphire 7950 GSkill Ares 16gb 
Hard DriveOptical DriveCoolingOS
Samsung 840 500GB None Swiftech H220 Windows 7 Ultimate 
MonitorMonitorMonitorMonitor
Crossover 27" 27Q Acer 23" V233H Hanns-G 24" HH241 LG 55" LD520  
KeyboardPowerCaseMouse
Ducky Shine2 DK9008 with Reds Seasonic X750 Gold NZXT Switch 810 Mionix Naos 8200 
Mouse PadAudioAudioAudio
Mionix Ensis 320 LR: Focal 726 Sub: HSU VTF15-H Denon AVR1911 
  hide details  
Reply
Dink
(20 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core I5 3570K Gigabyte UD3H Sapphire 7950 GSkill Ares 16gb 
Hard DriveOptical DriveCoolingOS
Samsung 840 500GB None Swiftech H220 Windows 7 Ultimate 
MonitorMonitorMonitorMonitor
Crossover 27" 27Q Acer 23" V233H Hanns-G 24" HH241 LG 55" LD520  
KeyboardPowerCaseMouse
Ducky Shine2 DK9008 with Reds Seasonic X750 Gold NZXT Switch 810 Mionix Naos 8200 
Mouse PadAudioAudioAudio
Mionix Ensis 320 LR: Focal 726 Sub: HSU VTF15-H Denon AVR1911 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Web Coding
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Web Coding › Web Spider without database?