Subscribe to our Newsletter

Did I create a new NoSQL database environment?

When I analyze datasets for clients (say Internet click logfiles), I typically write a Perl script, summarize click log data into a "fact" table stored in memory - actually stored as an hash table - with a number of auxiliary lookup tables also stored in memory as hash tables. I use Perl as programming language, so memory allocation is transparent. Occasionally, with large data sets, I summarize the data into multiple hash tables, one stored in memory, the other ones stored as files, with data processed in turn or in parallel (similar to Map Reduce).

When using multivariate keys such as (day, Referral ID), I use instructions such as

$hash_FactTable_DayReferral_ClickCount{"07/12/2012|4387"}++;

to update my tables.

Null entries are never created, and all "joins" are quite efficient. There's no SQL code involved, just straight Perl. To retrieve data, I use loops such as

foreach $day (keys(%hash_Day)) {
  foreach $referralID (keys(%hash_Referral)) {
    $key="$day|$referral_ID"
    $clickCount=$hash_FactTable_DayReferral_ClickCount{$key};
    ...
  }
}

Does this qualify as a NoSQL database environment? Is it very different from MongoDB or some other NoSQL products available on the market? Note that I store my summary files (the hash tables) as text files.

Am I re-inventing the wheel? Or am I using an inefficient system? It seems that I do my analyses much faster than anybody else, thanks to this system.

Originally posted on Analytic Bridge

You need to be a member of Hadoop360 to add comments!

Join Hadoop360

Email me when people reply –

Resources

Research