Subscribe to our Newsletter

10 tools and platforms for data preparation

Guest blog post by Zygimantas Jacikevicius

Traditional approaches to enterprise reporting, analysis and Business Intelligence such as Data Warehousing, upfront modelling and ETL have given way to new, more agile tools and ideas. Within this landscape Data Preparation tools have become very popular for good reason.  Data preparation has traditionally been a very manual task and consumed the bulk of most data project’s time.  Profiling data, standardising it and transforming it has traditionally been very manual and error prone.  This has derailed many Data Warehousing and analysis projects as they become bogged down with infrastructure and consistency issues rather than focusing on the true value add – producing good quality analysis.

Fortunately the latest generation of tools, typically powered by NoSQL technologies take a lot of this pain away. They enable users with reasonable technical skills to rapidly explore, understand and analyse datasets ranging from small data to data that is petabytes in scale.  Most tools also feature a variety of adaptors meaning that a variety of structured and semi-structured sources such as spreadsheets, database tables and XML / JSON content can also be explored and analysed.

It’s never been easier to rapidly derive value from disparate data. Here are 10 top tools that have impressed the consultants at Data to Value.

For more blogs, webinars, videos or data management solutions please visit our website www.datatovalue.co.uk

 1. Paxata

Paxata is a self-service adaptive data preparation platform that lets analysts quickly and painlessly collect, explore, combine and transform data. It offers high flexibility not requiring pre-defined models when analysing raw data, moreover it works with a wide variety of formats or data management systems for users to easily see relationships across various data-sets.

 2. Alteryx

Alteryx is a tool that enables a user to blend data from different sources in one seamless workflow. Alteryx minimises the need for extensive data preparation, enabling a user to easily access the data they need. It can handle structured and unstructured data in different formats and from various sources. It also makes it easy for users with different expertise to collaborate together on a single workflow and solve problems more efficiently.

3. Lavastorm

Lavastorm analytics engine helps business users to self-service large data-sets from virtually any source and any format, making quick business decisions easy without rigorous modelling scripting or planning. Users can quickly create and automate data with a wide variety of data set blending options without IT support. Moreover it supports a sharing function for even greater productivity.

 4. SAP Lumira

SAP Lumira helps to attain, manipulate and visualise complex and large data-sets across a wide range of sources and formats in the same view. This allows to produce useful analytics in beautiful visualisations that Business Objects users will be very familiar with.  A good choice for those seeking an enterprise strength tool.

 5. Platfora

Platfora is a visually rich and very advanced end-to-end solution for business analysis built in the Hadoop infrastructure with features such as in-memory computing. It features many uses of partner tools within the Big Data ecosystem and enables users to explore data quickly and efficiently without custom code. This saves time and ensures that insights are used in line with the most recent data. Users can interact with various set of multi structured data and ask emerging questions in the seamless manner.

 6. Teradata Loom

Teradata Loom provides a data management tool for the Hadoop data lake. Loom enables users to rapidly find prepare and analyse data within a Hadoop cluster. With Loom you can reuse existing data filters use a framework called “Active Scan” which constantly catalogues and profiles data in HTFS and Hive.

 

 7. DataWatch

DataWatch provides a visual platform for business analytics. It offers an all-in-one tool for data cleansing transforming and preparation from structured and unstructured datasets. It allows users to discover data in real-time and execute dynamic queries according to the business needs.

 

 8. Datameer

Datameer is a big data analytics platform purposively built for Hadoop. It combines self-service data, analytics and infographics in useful and easy way for stakeholders to interpret. It provides an end-to-end single workflow to simplify the big data analytics process.

 

 9. Tamr

Tamr connects and enriches data allowing to quickly leverage and reduce the effort to access it. It uses advanced algorithms, machine learning and human guidance to resolve any uncertainty. It continually builds a data inventory and an expert directory while continually enhancing data assets for useful insights.

 10. Rapidminer Studio

Rapidminer Studio is a popular open source predictive analytics platform that grew out of the Data Mining community. The platform provides all of the necessary tools for a mature data mining process. It provides accurate pre-processing, supports multiple interfaces and executes a wide range of operations ranging from data preparation to model building and validation.

  

E-mail me when people leave their comments –

You need to be a member of Hadoop360 to add comments!

Join Hadoop360

Resources

Research