Guide to High Performance Distributed Computing Case Studies with Hadoop Scalding and Spark Computer Communications and Networks 1st Edition by Srinivasa Muppalla, Anil Kumar – Ebook PDF Instant Download/Delivery: 3319134965, 9783319134963
Full download Guide to High Performance Distributed Computing Case Studies with Hadoop Scalding and Spark Computer Communications and Networks 1st Edition after payment
Product details:
ISBN 10: 3319134965
ISBN 13: 9783319134963
Author: K.G. Srinivasa; Anil Kumar Muppalla
This timely text/reference describes the development and implementation of large-scale distributed processing systems using open source tools and technologies. Comprehensive in scope, the book presents state-of-the-art material on building high performance distributed computing systems, providing practical guidance and best practices as well as describing theoretical software frameworks. Features: describes the fundamentals of building scalable software systems for large-scale data processing in the new paradigm of high performance distributed computing; presents an overview of the Hadoop ecosystem, followed by step-by-step instruction on its installation, programming and execution; Reviews the basics of Spark, including resilient distributed datasets, and examines Hadoop streaming and working with Scalding; Provides detailed case studies on approaches to clustering, data classification and regression analysis; Explains the process of creating a working recommender system using Scalding and Spark.
Guide to High Performance Distributed Computing Case Studies with Hadoop Scalding and Spark Computer Communications and Networks 1st Table of contents:
Part I Programming Fundamentals of High Performance Distributed Computing
1 Introduction
1.1 Distributed Systems
1.2 Types of Distributed Systems
1.2.1 Distributed Embedded System
1.2.2 Distributed Information System
1.2.3 Distributed Computing Systems
1.3 Distributed Computing Architecture
1.4 Distributed File Systems
1.4.1 DFS Requirements
1.4.2 DFS Architecture
1.5 Challenges in Distributed Systems
1.6 Trends in Distributed Systems
1.7 Examples of HPDC Systems
References
2 Getting Started with Hadoop
2.1 A Brief History of Hadoop
2.2 Hadoop Ecosystem
2.3 Hadoop Distributed File System
2.3.1 Characteristics of HDFS
2.3.2 Namenode and Datanode
2.3.3 File System
2.3.4 Data Replication
2.3.5 Communication
2.3.6 Data Organization
2.4 MapReduce Preliminaries
2.5 Prerequisites for Installation
2.6 Single Node Cluster Installation
2.7 Multi-node Cluster Installation
2.8 Hadoop Programming
2.9 Hadoop Streaming
References
3 Getting Started with Spark
3.1 Overview
3.2 Spark Internals
3.3 Spark Installation
3.3.1 Pre-requisites
3.3.2 Getting Started
3.3.3 Example: Scala Application
3.3.4 Spark with Python
3.3.5 Example: Python Application
3.4 Deploying Spark
3.4.1 Submitting Applications
3.4.2 Standalone Mode
References
4 Programming Internals of Scalding and Spark
4.1 Scalding
4.1.1 Installation
4.1.2 Programming Guide
4.2 Spark Programming Guide
References
Part II Case studies using Hadoop, Scalding and Spark
5 Case Study I: Data Clustering using Scalding and Spark
5.1 Introduction
5.2 Clustering
5.2.1 Clustering Techniques
5.2.2 Clustering Process
5.2.3 K-Means Algorithm
5.2.4 Simple K-Means Example
5.3 Implementation
5.3.1 Scalding Implementation
Problems
References
6 Case Study II: Data Classification using Scalding and Spark
6.1 Classification
6.2 Probability Theory
6.2.1 Random Variables
6.2.2 Distributions
6.2.3 Mean and Variance
6.3 Naive Bayes
6.3.1 Probabilty Model
6.3.2 Parameter Estimation and Event Models
6.3.3 Example
6.4 Implementation of Naive Bayes Classifier
6.4.1 Scalding Implementation
6.4.2 Results
Problems
References
7 Case Study III: Regression Analysis using Scalding and Spark
7.1 Steps in Regression Analysis
7.2 Implementation Details
7.2.1 Linear Regression: Algebraic Method
7.2.2 Scalding Implementation
7.2.3 Spark Implementation
7.2.4 Linear Regression: Gradient Descent Method
7.2.5 Scalding Implementation
7.2.6 Spark Implementation
Problems
References
8 Case Study IV: Recommender System using Scalding and Spark
8.1 Recommender Systems
8.1.1 Objectives
8.1.2 Data Sources for Recommender Systems
8.1.3 Techniques used in Recommender Systems
8.2 Implementation Details
8.2.1 Spark Implementation
8.2.2 Scalding Implementation:
People also search for Guide to High Performance Distributed Computing Case Studies with Hadoop Scalding and Spark Computer Communications and Networks 1st:
introduction to high performance computing for scientists and engineers
introduction to high performance computing
zguide
highly distributed systems
Tags:
Srinivasa Muppalla,Anil Kumar,Performance,Distributed