Subscribe to our Newsletter

Featured Posts (351)

Ember Data (a.k.a ember-data or ember.data) is a library for robustly managing model data in Ember.jsapplications. The developers of Ember Data state that it is designed to be agnostic to the underlying persistence mechanism, so it works just as well with JSON APIs over HTTP as it does with streaming WebSockets or local IndexedDB storage. It provides many of the facilities you’d find in server-side object relational mappings (ORMs) like ActiveRecord, but is designed specifically for the unique environment of JavaScript in the browser.

While Ember Data may take some time to…

Read more…
Google formally announced Android 7.0 a few weeks ago, but as usual, you’ll have to wait for it. Thanks to the Android update model, most users won’t get their Android 7.0 over-the-air (OTA) updates for months. However, this does not mean developers can afford to ignore Android Nougat. In this article, Toptal Technical Editor Nermin Hajdarbegovic takes a closer look at Android 7.0, outlining new features and changes. While Android 7.0 is by no means revolutionary, the introduction of a new graphics API, a new JIT compiler, and a range of UI and performance tweaks will undoubtedly unlock more potential and generate a few new possibilities.
Read more…

I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written. Some time later, I did a fun data science project trying to predict survival on the Titanic. This turned out to be a great way to get further introduced to Spark concepts and programming. I highly recommend it for any aspiring Spark developers looking for a place to get started.

Today, Spark is being adopted by major players like Amazon, eBay, and Yahoo! Many organizations run Spark on clusters with thousands of nodes. According to the Spark FAQ, the largest known cluster has over 8000 nodes. Indeed, Spark is a technology well worth taking note of and learning about.

apache spark tutorial

This article provides an introduction to Spark including use cases and examples. It contains…

Read more…

Associative Data Modeling Demystified - Part1

Guest blog post by Athanassios Hatzis

Relation, Relationship and Association

While most players in the IT sector adopted Graph or Document databases and Hadoop based solutions, Hadoop is an enabler of HBase column store, it went almost unnoticed that several new DBMS, AtomicDB previous database engine of X10SYS, and Sentences, based on associative technology appeared on the scene. We have introduced and discussed about the…

Read more…

Guest blog post by Marc Borowczak

Moving legacy data to modern big data platform can be daunting at times. It doesn’t have to be. In this short tutorial, we’ll briefly review an approach and demonstrate on my preferred data set: This isn’t a ML repository nor a Kaggle competition data set, simply the data I accumulated over decades to keep track of my plastic model collection, and as such definitely meets the legacy standard!

We’ll describe steps followed on a laptop VirtualBox machine running Ubuntu 16.04.1 LTS Gnome. The following steps…

Read more…

Java versus Python

Originally posted on Data Science Central

Interesting picture that went viral on Facebook. We've had plenty of discussions about Python versus R on DSC. This picture is trying to convince us that Python is superior to Java. It is about a tiny piece of code to draw a pyramid.

This raises several questions:

  • Is Java faster than Python? If yes, under what circumstances? And by how much? 
  • Does the speed of an algorithm depend more on the…
Read more…

Why Not So Hadoop?

Guest blog post by Kashif Saiyed

Does Big Data mean Hadoop? Not really, however when one thinks of the term Big Data, the first thing that comes to mind is Hadoop along with heaps of unstructured data. An exceptional lure for data scientists having the opportunity to work with large amounts data to train their models and businesses getting knowledge previously never imagined. But has it lived up to the hype? In this article, we will look at a brief history of Hadoop and see how it stands today.

2015 Hype Cycle – Gartner

 
hadoophype

Some key takeaways from the Hype cycle of 2015:

  1. ‘Big Data’ was at the Trough of Disillusionment stage in 2014, but is not seen in the 2015 Hype cycle.
  2. Another interesting point is that ‘Internet of Things’ which suggests a network of interconnected devices around us, is at peak for 2 years consistently…
Read more…

Originally posted on Data Science Central

Summary

Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science.

About the Technology

Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started.

About the Book

Introducing Data ScienceIntroducing Data Science explains vital data science concepts and teaches you…

Read more…

Originally posted on Data Science Central

Summary:  This is the first in a series of articles aimed at providing a complete foundation and broad understanding of the technical issues surrounding an IoT or streaming system so that the reader can make intelligent decisions and ask informed questions when planning their IoT system. 

In This Article

In Lesson 2

In Lesson 3

Is…

Read more…

Originally posted on Data Science Cental

Cloud giants like Amazon, Google, Azure and IBM have rushed into the big data analytics cloud market.  They claim their tools will make developer tasks simple. For machine learning, they say their cloud products will free data scientists and developers from implementation details so they can focus on business logic.  …

Read more…

Originally posted on Data Science Central

Thousands of articles and tutorials have been written about data science and machine learning. Hundreds of books, courses and conferences are available. You could spend months just figuring out what to do to get started, even to understand what data science is about.

In this short contribution, I share what I believe to be the most valuable resources - a small list of top resources and starting points. This will be most valuable to any data practitioner who has very little free time. 

Map-Reduce Explained

These resources…

Read more…

5 Big Data Myths Businesses Should Know

Guest blog post by Larry Alton

Big data is seeping into every facet of our lives. Smart home gadgets are becoming part of the nerve systems of new and remodeled homes, and many renters are demanding these interconnected gadgets from landlords.

But nowhere has Big Data created a bigger buzz than in business. Companies of all sizes are collecting data at a seemingly insurmountable rate. Big data is larger than ever before.

We’ve collected more data in…

Read more…

Originally posted on Data Science Central

We just started in this article to provide answers to one of the largest collection of data science job interview questions ever published, and we will continue to add answers to most of these questions. Some answers link to solutions offered in my Wiley data science book: you can find this book here. The 91 job interview questions were originally published here with no answers, and we recently added 50 questions to identify a true data scientist, …

Read more…

Guest blog post by Ankit Jain

Since its inception in the year 2008, the global Hadoop market has observed growth at a tremendous pace. This market, valued US$1.5 billion in 2012, is estimated to grow at a CAGR of 54.7% from 2012 to 2018. By the end of 2018, this market could amass a net worth of US$20.9 billion. With the massive amount of data generated every day across major industries, the global Hadoop market is anticipated to observe significant growth in the future as well.

Why Hadoop?

Quite naturally, the mounting scales of unstructured data generated every single day from data-intensive industries such as telecommunication, banking and finance, social media, research, healthcare, and defence has led to the rising adoption of Hadoop solutions.

The major factors driving the need to adopt Hadoop are its cost-sensitive and scalable methodologies…

Read more…

In any hiring process, a candidate with a professional certification always gets extra attention. Here are a few of the certifications in data science.

IBM Certified Data Architect -- Big Data

By this training, you will be able to master your skills in handling big data. The data architect will be having knowledge in different big data technologies, knowing their differences and then finally integrate them to find solutions of any business obscurity. The certification holder will be able to plan big data processors and help in the hardware and software architecture planning. This course is certified by IBM named as IBM Big data and is an added advantage to get your resume shortlisted in Interviews.

EMC Data Scientist…

Read more…
Choice is a good thing, but too much choice can lead to confusion and to buyers taking a “wait-and-see” approach until the market coalesces around the eventual winners. Lack of choice was an important factor in how quickly and readily companies bought into the RDBMS movement 30 or so years ago. I believe that too much choice is holding companies back from buying into the Hadoop / NoSQL movement.
Read more…

Hadoop, named after a toy elephant that belonged to the child of one its inventors, is an open-source software framework. It is capable of storing colossal amounts of data and handling massive applications and jobs endlessly. Hadoop’s capabilities make it one of the most sought after data platforms for successful businesses all over the world.

Hadoop Benefits

Because it can store and quickly process any type of data, Hadoop is lightyears ahead of the game in the open-source world. Data is increasing and changing everyday due to social media inventions, new mobile devices, and technological advancements. Here are a few more benefits it exudes:

  • Malleability - Hadoop is not like other databases that need to process its data before storing it. You can store as much as you need to and then process it later. That applies to images, videos, and text as well.
  • Failure tolerance - All of your data is protected against the…
Read more…

Hadoop is the hot new technology and SQL is the old, tried and tested tool for diving deep into big data, for analysis. This is true, but the number of projects that are putting an SQL front end on Hadoop data stores shows that there is a real need for data querying high level languages in the Hadoop environment. Hadoop MapReduce being a complicated tool for data analysis, developers had come up with Pig and Hive – similar to SQL, which makes it easy to implement Hadoop, without the need for coding in Java, to analyze data.  It is important to understand how different these are from each other – this is so that each can be optimally utilized for the right use case.

In the present age of Big Data, a number of querying options are available. While the old giant SQL continues to rein supremacy, organizations’ affinity towards open source programming and querying languages to tame Big Data has created plenty of space for Apache based Pig and Hive. Choosing the right…

Read more…

Resources

Research