Guest blog post by Mirko Krivanek
This is an interesting listing created by Bernard Marr. I would add the following great sources:
- DataScienceCentral selection of big data sets - check out the first itemized bullet list after clicking on this link
- Data sets used in our data science apprenticeship - includes both real data and simulated data - and tips to create artificial, rich, big data sets for testing models
- KDNuggets repository
- Data sets used in Kaggle competitions
Source for the picture: click here
Bernard's selection:
- Data.gov
- US Census Bureau
- European Union Open Data Portal
- Data.gov.uk
- The CIA World Factbook
- Healthdata.gov
- NHS Health and Social Care Information Centre
- Amazon Web Services public datasets
- Facebook Graph
- Gapminder
- Google Trends
- Google Finance
- Google Books Ngrams
- National Climatic Data Center
- DBPedia
- Topsy
- Likebutton
- New York Times
- Freebase
- Million Song Data Set
Read original article with description for each data repository.
DSC Resources
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
Additional Reading
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 13 New Trends in Big Data and Data Science
- 22 tips for better data science
- Data Science Compared to 16 Analytic Disciplines
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- High versus low-level data science
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
Comments