Guest blog post by Mirko Krivanek
Here is selection containing both external and internal papers, focusing on various technical aspects of data science and big data. Feel free to add your favorites.
Complex Open Text Analysis: Source: Avinash Kaushik
- Bigtable: A Distributed Storage System for Structured Data
- A Few Useful Things to Know about Machine Learning
- Random Forests
- A Relational Model of Data for Large Shared Data Banks
- Map-Reduce for Machine Learning on Multicore
- Pasting Small Votes for Classification in Large Databases and On-Line
- Recommendations Item-to-Item Collaborative Filtering
- Recursive Deep Models for Semantic Compositionality Over a Sentimen...
- Spanner: Google's Globally-Distributed Database
- Megastore: Providing Scalable, Highly Available Storage for Interac...
- F1: A Distributed SQL Database That Scales
- APACHE DRILL: Interactive Ad-Hoc Analysis at Scale
- A New Approach to Linear Filtering and Prediction Problems
- Top 10 algorithms on Data mining
- The PageRank Citation Ranking: Bringing Order to the Web
- MapReduce: Simplified Data Processing on Large Clusters
- The Google File System
- Amazon's Dynamo
DSC Internal Papers
- How to detect spurious correlations, and how to find the ...
- Automated Data Science: Confidence Intervals
- 16 analytic disciplines compared to data science
- From the trenches: 360-degree data science
- 10 types of regressions. Which one to use?
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- Jackknife logistic and linear regression for clustering and predict...
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predict...
- Internet topology mapping
- 11 Features any database, SQL or NoSQL, should have
- 10 Features all Dashboards Should Have
- Clustering idea for very large datasets
- Hidden decision trees revisited
- Correlation and R-Squared for Big Data
- What Map Reduce can't do
- Excel for Big Data
- Fast clustering algorithms for massive datasets
- The curse of big data
- Interesting Data Science Application: Steganography
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 13 New Trends in Big Data and Data Science
- 22 tips for better data science
- Data Science Compared to 16 Analytic Disciplines
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- High versus low-level data science