Subscribe to our Newsletter

All Posts (382)

HDFS vs. HBase : All you need to know


The sudden increase in the volume of data from the order of gigabytes to zettabytes has created the need for a more organized file system for storage and processing of data. The demand stemming from the data market has brought Hadoop in the limelight making it one of biggest players in the industry. Hadoop Distributed File System (HDFS), the commonly known file system of Hadoop and Hbase (Hadoop’s database) are the most topical and advanced data storage and management systems available in the market.

What are HDFS and HBase?

HDFS is fault-tolerant by design and supports rapid data transfer between nodes even during system failures. HBase is a non-relational and open source Not-Only-SQL database that runs on top of Hadoop. HBase comes under CP type of CAP (Consistency, Availability, and Partition Tolerance) theorem.

HDFS is most suitable for…

Read more…


By now, you have probably heard of the Hadoop Distributed File System (HDFS), especially if you are data analyst or someone who is responsible for moving data from one system to another. However, what are the benefits that HDFS has over relational databases?

HDFS is a scalable, open source solution for storing and processing large volumes of data. HDFS has been proven to be reliable and efficient across many modern data centers.

HDFS utilizes commodity hardware along with open source…

Read more…

Data is a key asset of any company, particularly transactional data which holds business secrets such as financial or health records. Data is most vulnerable in transit between the server that stores it and that client that requests it.

The standard approach to ensuring security is to encrypt data on the server and use the SSL-enabled HTTPS protocol to secure data in transport. However, what if we could increase the level of security even further, by using HTTPS and sending data in an encrypted format over the communication line, only to decrypt data on clients who have valid certificates? That approach would make a traditional man-in-the-middle (MITM) attack much more difficult.…

Read more…

Guest blog post by Bill Vorhies

Summary:  The shortage of data scientists is driving a growing number of developers to fully Automated Predictive Analytic platforms.  Some of these offer true One-Click Data-In-Model-Out capability, playing to Citizen Data Scientists with limited or no data science expertise.  Who are these players and what does it mean for the profession of data science?


In a recent poll the question was raised “Will Data Scientists be replaced by software, and if so, when?”  The consensus answer:

Data Scientists automated and unemployed by 2025.

Are we really just grist for the AI mill?  Will robots replace us?

As part of…

Read more…

Ten top languages for crunching Big Data

Guest blog post by Bernard Marr

With an ever-growing number of businesses turning to Big Data and analytics to generate insights, there is a greater need than ever for people with the technical skills to apply analytics to real-world problems.

Computer programming is still at the core of the skillset needed to create algorithms that can crunch through whatever structured or unstructured data is thrown at them. Certain languages have proven themselves better at this task than others. Here’s a brief overview of 10 of the most popular and widely used.…


Read more…

Find below the list of Hadoop interview questions and answers jotted down to help job seekers

Question: What is Hadoop and its workings?

Answer: When “Big Data” appeared as problematic, Apache Hadoop changed as an answer to it. Apache Hadoop is a context which offers us numerous facilities or tools to store and development of Big Data. It benefits in analysing Big Data and creation business decisions out of it, which can’t be done professionally and successfully using old-style systems. 

Question: What is the usage of Hadoop?

Answer: With Hadoop, the employer can run requests on the systems that have thousands of bulges scattering through countless terabytes. Rapid data dispensation and assignment among nodes helps continuous operation even when a node fails averting system let-down.

Question: On what idea, the Hadoop framework runs?

Answer: Hadoop Framework acts upon the subsequent two core…

Read more…

BY Andrew Belousoff

Suppose you need to create a high-load project based on a PHP MVC framework. You would probably use caching wherever possible. Maybe you would build the project in a single file, or maybe even write your own MVC framework with minimal functionality, or rewrite some parts of another framework. While, yes, this works, it’s a little bit tricky, isn’t it? Fortunately, there is one more solution that makes most of these manipulations unnecessary (save for the cache, perhaps), and this solution is called the PhalconPHP framework.

What Is PhalconPHP?

PhalconPHP is an MVC framework for PHP written in C and supplied as a compiled PHP extension. This is what makes it one of the fastest frameworks available (to be completely honest the fastest one is Yaf, but it is a micro framework and…

Read more…


Recent technological advancement has seen a lot of sectors improve and the analytics sector has not been left behind. The latest advances in the Apache Hadoop system is a major improvement to the Hadoop’s viability as far as data storage is concerned. With these advances, it is clear that SQL software on Hadoop is the best way to access big data which confirms the authenticity of the concept of analytics in Hadoop system. Of course, this is no doubt a big deal, and it has shown how technology has evolved with the intention of fulfilling the potentials of big data analytics.

That said, there are a few reasons why in-cluster analytics is being classified as the big deal today. We highlight some of the top three behind this buzz.

1.      Most companies need it collect, analyze and transform data


Read more…

Data Engineering


With the rise of  and data science, many engineering roles are being challenged and expanded. One new-age role is .

Originally, the purpose of data engineering was the loading of external data sources and the designing of databases (designing and developing pipelines to collect, manipulate, store, and analyze data).

It has since grown to support the volume and complexity of big data. So data engineering now encapsulates a wide range of skills, from web-crawling, data cleansing, distributed computing, and data storage and retrieval.

For data engineering and data engineers, data storage and retrieval is the critical component of the pipeline together with how the data can be used and analyzed.

In recent times, many new and…

Read more…

Boost Your Data Munging with R

The R language is often perceived as a language for statisticians and data scientists. Quite a long time ago, this was mostly true. However, over the years the flexibility R provides via packages has made R into a more general purpose language. R was open sourced in 1995, and since that time repositories of R packages ar constantly growing. Still, compared to languages like Python, R is strongly based around the data.

Speaking about data, tabular data deserves particular attention, as it’s one of the most commonly used data types. It is a data type which corresponds to a table structure known in databases, where each column can be of a different type, and processing performance of that particular data type is the crucial factor for many applications.

R can be used for very efficient data munging of tabular data

R can be used for very efficient data munging of tabular data

In this article, we are going to present how…

Read more…

Guest blog post by Alessandro Piva

The proliferation of data and the huge potentialities for companies to turn data into valuable insights are increasing more and more the demand of Data Scientists.

But what skills and educational background must a Data Scientist have? What is its role within the organization? What tools and programming languages does he/she mostly use? These are some of the questions that the Observatory for Big Data Analytics of Politecnico di Milano is investigating through an international survey submitted to Data Scientists: if you work with data in your company, please support us in our…

Read more…

Switching careers from Java to Big data Hadoop

There is this point in all our lives where we think of switching careers or apprising our skill sets to improve our career growth or even just to stay updated with the growing trends. But careful analysis of the current trend and observing the requirements serves as a good method to choose which skill set get updated with. Looking at the current market, Hadoop and Big Data technology are growing extremely fast and has lots of market demands as well. A surge in interest in “Big Data” is prompting many Development Team Managers to consider Hadoop technology as it’s increasingly becoming a significant component of Big Data applications. In doing so, taking inventory of the skills sets required when dealing with Hadoop is vital. According to Helena Schwenk, analyst at MWD Advisors, quoted to that a well-rounded Hadoop implementation team’s skills should include experience in large-scale distributed systems and knowledge of languages such as Java, C++, Pig Latin and…

Read more…

From the last few years, big data technology has sustained to change and become more creativity ready in terms of usability, back-up, retrieval, and presentation. Elsewhere merely analyzing why something occurred, big data has permitted us to proactively drive business consequences and make more expressive decisions in real-time.

Experts have also been learning and forecasting the current and sustained growth of the big data and analytics market across the sphere, particularly in data-rich industries such as financial services.

From administrations using big data to protect against terrorist attacks to medical researchers forecasting disease spread outlines and agricultural businesses increasing crop yields, the big data  revolution is now at a stage that is bigger than the past industrial revolution before. Here are some visions on how big data will create better impacts this year:

Assisting the data skills ecosystem

As marketplaces…

Read more…

Cloudfare and GitHub Pages


I have a secret that saves my clients a ton of money, keeps their website secure, and has built-in backups.

The secret: I make their website static. Then, I store and host it with GitHub, and use Cloudflare to serve it over HTTPS, and make it fast. My clients only ever pay for their domain name, yet they get a lot more than they ever bargained for.


Why Static Content?

Static sites are wonderfully fast since there’s no server processing time involved. Also, by committing a code base of static assets in a git repository, rolling back changes simply becomes a matter of reverting to a previous commit. Backups are a git push away, and you essentially serve your entire website from the cache, meaning your server will almost never have…

Read more…

BY Paul Young - Freelance Sofware Engineer

These days, modern mobile application development requires a well thought-out plan for keeping user data in sync across various devices. This is a thorny problem with many gotchas and pitfalls, but users expect the feature, and expect it to work well.

For iOS and macOS, Apple provides a robust toolkit, called CloudKit API, which allows developers targeting Apple platforms to solve this synchronization problem.

In this article, I’ll demonstrate how to use CloudKit to keep a user’s data in sync between multiple clients. It’s intended for experienced iOS developers who are already familiar with Apple’s frameworks and with Swift. I’m going to take a fairly deep technical dive into the CloudKit API to explore ways you can leverage this technology to make awesome multi-device apps. I’ll focus on an iOS application, but the same approach can be used for macOS clients as well.


Read more…

When coming across the term “text search”, one usually thinks of a large body of text, which is indexed in a way that makes it possible to quickly look up one or more search terms when they are entered by a user. This is a classic problem for computer scientists, to which many solutions exist.

But how about a reverse scenario? What if what’s available for indexing beforehand is a group of search phrases, and only at runtime is a large body of text presented for searching? These questions are what this trie data structure tutorial seeks to address.

text search algorithm tutorial using tries


A real world application for this scenario is matching a number of medical theses against a list of medical…

Read more…

Top 10 Hadoop Interview Questions & Answers

Q1. What exactly is Hadoop?
A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.

Q2. What are 5 Vs of Big Data ?
A2. Volume – Size of the data
Velocity – Speed of change of data
Variety – Different types of data : Structured, Semi-Structured, Unstructured data.

Q3. Give me examples of Unstructured data.
A3. Images, Videos, Audios etc.

Q4. Tell me about Hadoop file system and processing framework.
A4. Hadoop files system is called as HDFS – Hadoop distributed file system. It consists of Name Node, Data Node and Secondary Name Node.
Hadoop processing framework is known as MapReduce. It caters Map and Reduce tasks that get scheduled in parallel to achieve efficiency.

Q5/ What is High Availability feature in Hadoop2.
A5. In Hadoop 2 Passive Name Node is introduced to avoid NameNode becoming single point of failure. This results into…

Read more…

7 familiar myths regarding Big Data analytics

Big Data analytics is in the buzz since a while, but people still have various misconceptions about it and the way it functions to assist you in transforming your business goals. Irrespective of the industry you are into, your company processes a huge amount of data raw data that can be tapered to a more organized form.

Let’s have a look on the common myths about Big Data:-

1. Big Data means lots of data

When you hear Big Data, instinctively an image of loads of data floats in your mind. Big Data is not all about having a huge bank of information which is hardly of any use, it means having quality data which is useful for your business. Having a huge data bank means, it is prone to have redundant and duplicate entries. Big Data analytics helps you streamline the right data, irrespective of the quantity.

2. Big Data is extremely essential

Having raw and unprocessed data is practically of no value for an organization, unless it is…

Read more…


By now, you have probably heard of the Hadoop Distributed File System (HDFS), especially if you are data analyst or someone who is responsible for moving data from one system to another. However, what are the benefits that HDFS has over relational databases?

HDFS is a scalable, open source solution for storing and processing large volumes of data. HDFS has been proven to be reliable and efficient across many modern data centers.

HDFS utilizes commodity hardware along with open source software to reduce the overall cost per byte of storage.

With its built-in replication and resilience to disk failures, HDFS is an…

Read more…

25 Predictions About The Future Of Big Data

Guest blog post by Robert J. Abate.

In the past, I have published on the value of information, big data, advanced analytics and the Abate Information Triangle and have recently been asked to give my humble opinion on the future of Big Data.

I have been fortunate to have been on three panels recently at industry conferences which discussed this very question with such industry thought leaders as: Bill Franks (CTO, Teradata), Louis DiModugno (CDAO, AXA US), Zhongcai Zhang, (CAO, NY Community Bank), Dewey Murdick, (CAO, Department Of Homeland Security), Dr. Pamela Bonifay Peele (CAO, UPMC Insurance Services), Dr. Len Usvyat (VP Integrated Care Analytics, FMCNA), Jeffrey Bohn (Chief Science Officer, State Street), Kenneth Viciana (Business Analytics Leader, Equifax) and others.

Each brought their unique perspective to the challenges of Big Data and their insights into their…

Read more…

Featured Blog Posts - DSC