Guest blog post by Kumar Chinnakali
Celebrate the Big Data Problems – #1
Daily we are facing many big data problems in production, PoC, and more perspective. Do we have any common repo to collect and share? No, as we know we don’t have any. As always dataottam is looking forward to share the learnings with community to celebrate their similar, same kind of problems. And also if you have any new kind of big data problem, we can jointly debate and experiment to celebrate our big data problem.
So we, dataottam have come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework.
Whether we are moving a small collection of selfies between apps or moving very large data sets remains a challenge. So Hadoop is one of the big data problem solvers, but transferring data to and from relation databases is still remains challenge post Hadoop stands. Hence SQL to Hadoop – Sqoop was created to perform bidirectional data transfer between Hadoop and all other external structured data sources.
How we can replace a special or required delimiters during Hive import or ingress from the relation database.
If we use – -hive-import options to import the data and selects the record count in the destination to check we will find more records than the source due to their delimiters.
In next Celebrate the Big Data Problems – #2 blog, we will share the big data problem called “How to identify the no of buckets for a Hive table while executing the HiveQL DDLs”.
As, always please feel free to comment us via coffee [at] dataottam [dot] com.