Overclock.net banner

1 - 2 of 2 Posts

·
Premium Member
Joined
·
65,162 Posts
Discussion Starter #1
I need some help converting a large dataset into a time-series. The dataset would be over 100,000 records daily so it needs to be converted into a time-series.

Here is an example of the concept. Instead of storing values for every day, a new record is only created when values change. The new record has the date range of when the value is valid:

Code:

Code:
[B]ID           AS_OF_DT         VALUE1          VALUE2[/B]
A             1/1/2010         abc                  123
A             1/2/2010         abc                   123
…
A             2/1/2010          abcd               123
…
A            3/1/2010           abcd                 1234

[B]ID         BEGIN_DT        END_DT         VALUE1       VALUE2[/B]
A        1/1/2010           2/1/2010         abc           123
A         2/1/2010          3/1/2010         abcd        123
A         3/1/2010          12/31/9999     abcd         1234
My though was to create a distinct list of ID/VALUE1/VALUE2 then find the min(AS_OF_DT) and max(AS_OF_DT). However, the issue I am having is when values match but there is a time period in between that does not like:

Code:

Code:
[B]ID         AS_OF_DT       VALUE1         VALUE2[/B]         
A         1/1/2010         abc         123         
A         2/1/2010         abc         123         
A         3/1/2010         abcd         1234         
A         4/1/2010         abc         123

Desired output:                                    
[B]ID         BEGIN_DT         END_DT         VALUE1      VALUE2[/B]
A         1/1/2010         3/1/2010         abc         123
A         3/1/2010         4/1/2010         abcd        1234
A         4/1/2010         12/31/9999       abc         123
Any ideas?
 

·
Registered
Joined
·
175 Posts
I'm not certain what kind of database environment you are working with, but in PostgreSQL you can use trigger function to simplify this work (I checked and MySQL supports triggers too).

So instead creating an entry, say once a month as your examples seem to show, use a trigger function to create an entry every time the value is changed.

A trigger can be created to execute when data in a table is INSERTed, UPDATEd, or DELETEd, and it is then executed once per row of data affected (in PostgreSQL, not sure about MySQL). Once in the context of the trigger you have access to both the old and new values for a row (obviously those are dependent on the action being done) and you can have it INSERT data into other tables or anything else for that matter.

So I would suggest creating a trigger that is executed when data is UPDATEd and have it insert the times as well as the new and old values.
 
1 - 2 of 2 Posts
Top