图书介绍

Hive编程 英文版PDF|Epub|txt|kindle电子书版本网盘下载

Hive编程 英文版
  • EdwardCapriolo,DeanWampler,JasonRutberglen著 著
  • 出版社: 南京:东南大学出版社
  • ISBN:9787564141974
  • 出版时间:2013
  • 标注页数:332页
  • 文件大小:119MB
  • 文件页数:350页
  • 主题词:数据库系统-程序设计-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

Hive编程 英文版PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

1.Introduction1

An Overview of Hadoop and MapReduce3

Hive in the Hadoop Ecosystem6

Pig8

HBase8

Cascading,Crunch,and Others9

Java Versus Hive:The Word Count Algorithm11

What's Next13

2.Getting Started15

Installing a Preconfigured Virtual Machine15

Detailed Installation16

Installing Java16

Installing Hadoop18

Local Mode,Pseudodistributed Mode,and Distributed Mode19

Testing Hadoop20

Installing Hive21

What Is Inside Hive?22

Starting Hive23

Configuring Your Hadoop Environment24

Local Mode Configuration24

Distributed and Pseudodistributed Mode Configuration26

Metastore Using JDBC28

The Hive Command29

Command Options29

The Command-Line Interface30

CLI Options31

Variables and Properties31

Hive"One Shot"Commands34

Executing Hive Queries from Files35

Thehiverc File36

More on Using the Hive CLI36

Command History37

Shell Execution37

Hadoop dfs Commands from Inside Hive38

Comments in Hive Scripts38

Query Column Headers38

3.Data Typesand File Formats41

Primitive Data Types41

Collection Data Types43

Text File Encoding of Data Values45

Schema on Read48

4.HiveQL:Data Definition49

Databases in Hive49

Alter Database52

Creating Tables53

Managed Tables56

External Tables56

Partitioned,Managed Tables58

External Partitioned Tables61

Customizing Table Storage Formats63

Dropping Tables66

Alter Table66

Renaming a Table66

Adding,Modifying,and Dropping a Table Partition66

Changing Columns67

Adding Columns68

Deleting or Replacing Columns68

Alter Table Properties68

Alter Storage Properties68

Miscellaneous Alter Table Statements69

5.HiveQL:Data Manipulation71

Loading Data into Managed Tables71

Inserting Data into Tables from Queries73

Dynamic Partition Inserts74

Creating Tables and Loading Them in One Query75

Exporting Data76

6.HiveQL:Queries79

SELECT...FROM Clauses79

Specify Columns with Regular Expressions81

Computing with Column Values81

Arithmetic Operators82

Using Functions83

LIMIT Clause91

Column Aliases91

Nested SELECT Statements91

CASE...WHEN...THEN Statements91

When Hive Can Avoid MapReduce92

WHERE Clauses92

Predicate Operators93

Gotchas with Floating-Point Comparisons94

LIKE and RLIKE96

GROUP BY Clauses97

HAVING Clauses97

JOIN Statements98

Inner JOIN98

Join Optimizations100

LEFT OUTER JOIN101

OUTER JOIN Gotcha101

RIGHT OUTER JOIN103

FULL OUTER JOIN104

LEFT SEMI-JOIN104

Cartesian Product JOINs105

Map-side Joins105

ORDER BY and SORT BY107

DISTRIBUTE BY with SORT BY107

CLUSTER BY108

Casting109

Casting BINARY Values109

Queries that Sample Data110

Block Sampling111

Input Pruning for Bucket Tables111

UNION ALL112

7.HiveQL:Views113

Views to Reduce Query Complexity113

Views that Restrict Data Based on Conditions114

Views and Map Type for Dynamic Tables114

View Odds and Ends115

8.HiveQL:Indexes117

Creating an Index117

Bitmap Indexes118

Rebuilding the Index118

Showing an Index119

Dropping an Index119

Implementing a Custom Index Handler119

9.Schema Design121

Table-by-Day121

Over Partitioning122

Unique Keys and Normalization123

Making Multiple Passes over the Same Data124

The Case for Partitioning Every Table124

Bucketing Table Data Storage125

Adding Columns to a Table127

Using Columnar Tables128

Repeated Data128

Many Columns128

(Almost)Always Use Compression!128

10.Tuning131

Using EXPLAIN131

EXPLAIN EXTENDED134

Limit Tuning134

Optimized Joins135

Local Mode135

Parallel Execution136

Strict Mode137

Tuning the Number of Mappers and Reducers138

JVM Reuse139

Indexes140

Dynamic Partition Tuning140

Speculative Execution141

Single MapReduce MultiGROUP BY142

Virtual Columns142

11.Other File Formats and Compression145

Determining Installed Codecs145

Choosing a Compression Codec146

Enabling Intermediate Compression147

Final Output Compression148

Sequence Files148

Compression in Action149

Archive Partition152

Compression:Wrapping Up154

12.Developing155

Changing Log4J Properties155

Connecting a Java Debugger to Hive156

Building Hive from Source156

Running Hive Test Cases156

Execution Hooks158

Setting Up Hive and Eclipse158

Hive in a Maven Project158

Unit Testing in Hive with hive_test159

The New Plugin Developer Kit161

13.Functions163

Discovering and Describing Functions163

Calling Functions164

Standard Functions164

Aggregate Functions164

Table Generating Functions165

A UDF for Finding a Zodiac Sign from a Day166

UDF Versus GenericUDF169

Permanent Functions171

User-Defined Aggregate Functions172

Creating a COLLECT UDAF to Emulate GROUP_CONCAT172

User-Defined Table Generating Functions177

UDTFs that Produce Multiple Rows177

UDTFs that Produce a Single Row with Multiple Columns179

UDTFs that Simulate Complex Types179

Accessing the Distributed Cache from a UDF182

Annotations for Use with Functions184

Deterministic184

Stateful184

DistinctLike185

Macros185

14.Streaming187

Identity Transformation188

Changing Types188

Projecting Transformation188

Manipulative Transformations189

Using the Distributed Cache189

Producing Multiple Rows from a Single Row190

Calculating Aggregates with Streaming191

CLUSTER BY,DISTRIBUTE BY,SORT BY192

GenericMR Tools for Streaming to Java194

Calculating Cogroups196

15.Customizing Hive File and Record Formats199

File Versus Record Formats199

Demystifying CREATE TABLE Statements199

File Formats201

Sequence File201

RCFile202

Example of a Custom Input Format:DualInputFormat203

Record Formats:SerDes205

CSV and TSV SerDes206

ObjectInspector206

Think Big Hive Reflection ObjectInspector206

XMLUDF207

XPath-Related Functions207

JSON SerDe208

Avro Hive SerDe209

Defining Avro Schema Using Table Properties209

Defining a Schema from a URI210

Evolving Schema211

Binary Output211

16.Hive Thrift Service213

Starting the Thrift Server213

Setting Up Groovy to Connect to HiveService214

Connecting to HiveServer214

Getting Cluster Status215

Result Set Schema215

Fetching Results215

Retrieving Query Plan216

Metastore Methods216

Example Table Checker216

Administrating HiveServer217

Productionizing HiveService217

Cleanup218

Hive ThriftMetastore219

ThriftMetastore Configuration219

Client Configuration219

17.Storage Handlers and NoSQL221

Storage Handler Background221

HiveStorageHandler222

HBase222

Cassandra224

Static Column Mapping224

Transposed Column Mapping for Dynamic Columns224

Cassandra SerDe Properties224

DynamoDB225

18.Security227

Integration with Hadoop Security228

Authentication with Hive228

Authorization in Hive229

Users,Groups,and Roles230

Privileges to Grant and Revoke231

Partition-Level Privileges233

Automatic Grants233

19.Locking235

Locking Support in Hive with Zookeeper235

Explicit,Exclusive Locks238

20.Hive Integration with Oozie239

Oozie Actions239

Hive Thrift Service Action240

A Two-Query Workflow240

Oozie Web Console242

Variables in Workflows242

Capturing Output243

Capturing Output to Variables243

21.Hive and Amazon Web Services(AWS)245

Why Elastic MapReduce?245

Instances245

Before Yon Start246

Managing Your EMR Hive Cluster246

Thrift Serveron EMR Hive247

Instance Groups on EMR247

Configuring Your EMR Cluster248

Deploying hive-site.xml248

Deploying a.hiverc Script249

Setting Up a Memory-Intensive Contiguration249

Persistence and the Metastore on EMR250

HDFS and S3 on EMR Cluster251

Putting Resources,Configs,and Bootstrap Scripts on S3252

Logs on S3252

Spot Instances252

Security Groups253

EMR Versus EC2 and Apache Hive254

Wrapping Up254

22.HCatalog255

Introduction255

MapReduce256

Reading Data256

Writing Data258

Command Line261

Security Model261

Architecture262

23.Case Studies265

m6d.com(Media6Degrees)265

Data Science at M6D Using Hive and R265

M6D UDF Pseudorank270

M6D Managing Hive Data Across Multiple MapReduce Clusters274

Outbrain278

In-Site Referrer Identification278

Counting Uniques280

Sessionization282

NASA's Jet Propulsion Laboratory287

The Regional Climate Model Evaluation System287

Our Experience:Why Hive?290

Some Challenges and How We Overcame Them291

Photobucket292

Big Data at Photobucket292

What Hardware Do We Use for Hive?293

What's in Hive?293

Who Does It Support?293

SimpleReach294

Experiences and Needs from the Customer Trenches296

A Karmasphere Perspective296

Introduction296

Use Case Examples from the Customer Trenches297

Glossary305

Appendix:References309

Index313

热门推荐