Subscribe to our Newsletter

Mike Beneth

Male

Field of Expertise Hadoop, Other

Professional Status Technical, Other

Interests: Other

Activity Feed
Photos
Blog Posts
Discussions
Originally posted on Data Science CentralRethinkDB is an open source noSQL database that stores JSON documents. This can be great for open ended data analytics. The company officially provides drivers for Ruby, Python and NodeJS and community suppor…
Originally posted Data Science CentralGraphs are everywhere, used by everyone, for everything. Neo4j is one of the most popular graph database that can be used to make recommendations, get social, find paths, uncover fraud, manage networks, and so o…
Originally posted on Data Science CentralIn this post, we will look at the various 'Shrines' and 'Giants' on whose shoulders most modern Data Scientists stand. I am often daunted by the Job Descriptions people come up with for Data Scientists these…
Summary:  NewSQL is alive and well and under the right circumstances could be your best choice.
No this is not a misprint.  Yes we mean NewSQL, not NoSQL.  Recently a colleague asked me about NewSQL and I had to admit that I hadn’t kept up.  While i…
Summary:  Column Oriented DBs excel at OLAP and are efficient at partial updates.Many folks believe that Hadoop is the original NOSQL database and it is the first that was available commercially in 2008.  But Hadoop grew out of a research paper publ…
As an individual user, we are no longer living the world of one computer, rather we are living with living and distributed smart devices. Let us first explore a Single-User/Single-Server Architecture that we used in those times, which we survived, w…
Summary:  This blog series is designed to help you understand which NOSQL Big Data database is right for you.  It is addressed to business executives and managers who need a primer on how this decision should be made. 

Starting a Big Data Initiativ…
When I analyze datasets for clients (say Internet click logfiles), I typically write a Perl script, summarize click log data into a "fact" table stored in memory - actually stored as an hash table - with a number of auxiliary lookup tables also stor…
Here's some important features that I think all databases should have:Offering text standardization function to help with (1) data cleaning, (2) reducing volume of text information, (3) merging data from different sources or having different charact…
 Many healthcare companies are aligning their long term goals to collect data from various streams into logical data warehouse to get competent and increase its operating margins (James Manyika, 2011).   A data-lake in HDFS (Hadoop distributed file…
Guest blog post by Michael WalkerApache Hadoop announced a beta release for Hadoop 2. The Hadoop-2.1.0-beta provides the following key enhancements:Integration testing - with entire Hadoop ecosystem including HBase, Pig and Hive.API & protocol stabi…
Guest blog post by Fawad AlamNote: Opinions expressed are solely my own and do not express the views or opinions of my employer.As a data scientist who has been munging data and building machine learning models in tools like R, Python and other soft…
Guest blog post by John FairweatherIn following the big data 'buzz' and trends, it appears that there is a disconnect between our analytical goals (i.e., the types of questions our customers are trying to answer) and the computational substrate on w…
Guest blog post by Jos VerwoerdBig Data holds a big promise. But has that promise paid out already? Or are you heading for Big Dollar Disaster? Many take inventory of their data and find out they have terabytes of data lying around. Surely something…
Guest blog post by Vincent GranvilleBig Data Principles and best practices of scalable realtime data systemsNathan Marz and Sam RitchieMEAP Began: January 2012Softbound print: Summer 2012 | 425 pages ISBN: 9781617290343Pre-Order options*Order today…
More…