Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Application Programming › Looking For A Giga-Record Database
New Posts  All Forums:Forum Nav:

Looking For A Giga-Record Database - Page 2

post #11 of 15
Quote:
Originally Posted by parityboy View Post

lol guys, thanks for the replies (and the in-thread convo, lol). I discounted relational for performance reasons, simply because the data I'm looking to store maps better to document and K/V structures. After looking at the fragment I posted, I'm beginning to think K/V is the way to go, but I know little about them apart from the basic concept.
What's your experience with K/V databases as a developer? Are they easy to use? What do you use to query them - is there a common query language?

I've only ever written bespoke "in memory" K/V databases, so I can't really be much help. But you know the values you need each time (ie it's always going to have those 17 fields per record), the you could always create a multi-dimensional array and store in there. The memory requirements might be huge for some of the larger datasets - but you're going to run into that issue with whichever solution you opt for. So the only other work around would be to either discard surplus information (and then read that back from the net if/when requested), or to use a staggered load where only (for example) 50 records are loaded at a time.
post #12 of 15
Thread Starter 
Was thinking along the same lines. I don't think memory will be an issue when displaying headers from the local database, but I do think it will be an issue when pulling the headers from the server for the initial and subsequent updates.

Basically, I'm gonna have to write some test code and see how it performs. I'm also going to study the code of applications like KLibido and KNode and see how they do things. smile.gif
Edited by parityboy - 9/8/12 at 3:34pm
Mythica
(14 items)
 
  
CPUMotherboardGraphicsRAM
Intel i3 530 Gigabyte GA-H55M-D2H Palit nVidia GT430 Corsair Dominator 4GB TW3X4G1333C9A 
Hard DriveHard DriveOSMonitor
Hitachi Deskstar 7K500 Samsung HD204UI Linux Mint 13 HP L1800 
KeyboardPowerCaseMouse
Trust EasyScroll Silverline Corsair HX520 Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Mythica
(14 items)
 
  
CPUMotherboardGraphicsRAM
Intel i3 530 Gigabyte GA-H55M-D2H Palit nVidia GT430 Corsair Dominator 4GB TW3X4G1333C9A 
Hard DriveHard DriveOSMonitor
Hitachi Deskstar 7K500 Samsung HD204UI Linux Mint 13 HP L1800 
KeyboardPowerCaseMouse
Trust EasyScroll Silverline Corsair HX520 Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
post #13 of 15
I know you're discounting relational databases, but Newznab (which does a lot of what you're describing), uses mysql and powers nzb.su, and nzbs.org. Both have HUGE amounts of data.
RAWR
(17 items)
 
Home Server
(14 items)
 
 
Reply
RAWR
(17 items)
 
Home Server
(14 items)
 
 
Reply
post #14 of 15
Thread Starter 
@hometoast

I discounted MySQL in particular - and relational in general - mainly for performance and dependency reasons. I run KDE 4.8.x, and the desktop and email indexers (a combination of Akonadi, virtuoso-t and MySQL) require a rather large amount of horsepower, at least for the initial indexing. My desktop runs an Intel i3 530 and two 500GB Hitachis in RAID 1 - after watching the indexers thrash my hard disks and beat up my CPU, I switched off the indexer.

If I was to use a private MySQL instance I'd likely run into the same performance issues; the sites you mention no doubt have a cluster of servers with tens of gigabytes of RAM, sharded databases and Memcached. The average desktop will not have those resources, and I'll likely have to deal with similar numbers of headers - i.e. numbered in the billions. Additionally, if I'm going to use an external mysqld, I may as well integrate into Akonadi.

Therefore I need a data structure which will map as closely as possible to the data I'm trying to store. Having thought about it, using an SQL database I'll likely have to put all of the groups on one table, then each distinct post in its own table, creating and destroying tables on-the-fly

So for example if somebody posted "canonical_ubuntu_12_10.iso" as a set of RARs, I'll probably have to create a table on-the-fly named "canonical_ubuntu_12_10.iso", then every header for every article representing every part of every RAR would have to be inserted into the table.

I can see this being expensive CPU and memory-wise no matter which path I take. I'm thinking that I should see acceptable performance on a dual-core with 4GB RAM and a single modern hard disk. smile.gif

Of course, I can't really know until it's actually been coded, but I'm trying to take the right path from the beginning. smile.gif
Edited by parityboy - 9/14/12 at 2:13pm
Mythica
(14 items)
 
  
CPUMotherboardGraphicsRAM
Intel i3 530 Gigabyte GA-H55M-D2H Palit nVidia GT430 Corsair Dominator 4GB TW3X4G1333C9A 
Hard DriveHard DriveOSMonitor
Hitachi Deskstar 7K500 Samsung HD204UI Linux Mint 13 HP L1800 
KeyboardPowerCaseMouse
Trust EasyScroll Silverline Corsair HX520 Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Mythica
(14 items)
 
  
CPUMotherboardGraphicsRAM
Intel i3 530 Gigabyte GA-H55M-D2H Palit nVidia GT430 Corsair Dominator 4GB TW3X4G1333C9A 
Hard DriveHard DriveOSMonitor
Hitachi Deskstar 7K500 Samsung HD204UI Linux Mint 13 HP L1800 
KeyboardPowerCaseMouse
Trust EasyScroll Silverline Corsair HX520 Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
post #15 of 15
Thread Starter 
I've been researching this further and have come across CLucene, a C++ implementation of Apache Lucene, a full text search engine. Has anyone here used Lucene or Solr? How far do they scale in terms of the number of documents? What's the performance like?

Many thanks. smile.gif
Mythica
(14 items)
 
  
CPUMotherboardGraphicsRAM
Intel i3 530 Gigabyte GA-H55M-D2H Palit nVidia GT430 Corsair Dominator 4GB TW3X4G1333C9A 
Hard DriveHard DriveOSMonitor
Hitachi Deskstar 7K500 Samsung HD204UI Linux Mint 13 HP L1800 
KeyboardPowerCaseMouse
Trust EasyScroll Silverline Corsair HX520 Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Mythica
(14 items)
 
  
CPUMotherboardGraphicsRAM
Intel i3 530 Gigabyte GA-H55M-D2H Palit nVidia GT430 Corsair Dominator 4GB TW3X4G1333C9A 
Hard DriveHard DriveOSMonitor
Hitachi Deskstar 7K500 Samsung HD204UI Linux Mint 13 HP L1800 
KeyboardPowerCaseMouse
Trust EasyScroll Silverline Corsair HX520 Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Application Programming
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Application Programming › Looking For A Giga-Record Database