图书介绍
Hive编程 英文版PDF|Epub|txt|kindle电子书版本网盘下载
![Hive编程 英文版](https://www.shukui.net/cover/17/35057374.jpg)
- EdwardCapriolo,DeanWampler,JasonRutberglen著 著
- 出版社: 南京:东南大学出版社
- ISBN:9787564141974
- 出版时间:2013
- 标注页数:332页
- 文件大小:119MB
- 文件页数:350页
- 主题词:数据库系统-程序设计-英文
PDF下载
下载说明
Hive编程 英文版PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
1.Introduction1
An Overview of Hadoop and MapReduce3
Hive in the Hadoop Ecosystem6
Pig8
HBase8
Cascading,Crunch,and Others9
Java Versus Hive:The Word Count Algorithm11
What's Next13
2.Getting Started15
Installing a Preconfigured Virtual Machine15
Detailed Installation16
Installing Java16
Installing Hadoop18
Local Mode,Pseudodistributed Mode,and Distributed Mode19
Testing Hadoop20
Installing Hive21
What Is Inside Hive?22
Starting Hive23
Configuring Your Hadoop Environment24
Local Mode Configuration24
Distributed and Pseudodistributed Mode Configuration26
Metastore Using JDBC28
The Hive Command29
Command Options29
The Command-Line Interface30
CLI Options31
Variables and Properties31
Hive"One Shot"Commands34
Executing Hive Queries from Files35
Thehiverc File36
More on Using the Hive CLI36
Command History37
Shell Execution37
Hadoop dfs Commands from Inside Hive38
Comments in Hive Scripts38
Query Column Headers38
3.Data Typesand File Formats41
Primitive Data Types41
Collection Data Types43
Text File Encoding of Data Values45
Schema on Read48
4.HiveQL:Data Definition49
Databases in Hive49
Alter Database52
Creating Tables53
Managed Tables56
External Tables56
Partitioned,Managed Tables58
External Partitioned Tables61
Customizing Table Storage Formats63
Dropping Tables66
Alter Table66
Renaming a Table66
Adding,Modifying,and Dropping a Table Partition66
Changing Columns67
Adding Columns68
Deleting or Replacing Columns68
Alter Table Properties68
Alter Storage Properties68
Miscellaneous Alter Table Statements69
5.HiveQL:Data Manipulation71
Loading Data into Managed Tables71
Inserting Data into Tables from Queries73
Dynamic Partition Inserts74
Creating Tables and Loading Them in One Query75
Exporting Data76
6.HiveQL:Queries79
SELECT...FROM Clauses79
Specify Columns with Regular Expressions81
Computing with Column Values81
Arithmetic Operators82
Using Functions83
LIMIT Clause91
Column Aliases91
Nested SELECT Statements91
CASE...WHEN...THEN Statements91
When Hive Can Avoid MapReduce92
WHERE Clauses92
Predicate Operators93
Gotchas with Floating-Point Comparisons94
LIKE and RLIKE96
GROUP BY Clauses97
HAVING Clauses97
JOIN Statements98
Inner JOIN98
Join Optimizations100
LEFT OUTER JOIN101
OUTER JOIN Gotcha101
RIGHT OUTER JOIN103
FULL OUTER JOIN104
LEFT SEMI-JOIN104
Cartesian Product JOINs105
Map-side Joins105
ORDER BY and SORT BY107
DISTRIBUTE BY with SORT BY107
CLUSTER BY108
Casting109
Casting BINARY Values109
Queries that Sample Data110
Block Sampling111
Input Pruning for Bucket Tables111
UNION ALL112
7.HiveQL:Views113
Views to Reduce Query Complexity113
Views that Restrict Data Based on Conditions114
Views and Map Type for Dynamic Tables114
View Odds and Ends115
8.HiveQL:Indexes117
Creating an Index117
Bitmap Indexes118
Rebuilding the Index118
Showing an Index119
Dropping an Index119
Implementing a Custom Index Handler119
9.Schema Design121
Table-by-Day121
Over Partitioning122
Unique Keys and Normalization123
Making Multiple Passes over the Same Data124
The Case for Partitioning Every Table124
Bucketing Table Data Storage125
Adding Columns to a Table127
Using Columnar Tables128
Repeated Data128
Many Columns128
(Almost)Always Use Compression!128
10.Tuning131
Using EXPLAIN131
EXPLAIN EXTENDED134
Limit Tuning134
Optimized Joins135
Local Mode135
Parallel Execution136
Strict Mode137
Tuning the Number of Mappers and Reducers138
JVM Reuse139
Indexes140
Dynamic Partition Tuning140
Speculative Execution141
Single MapReduce MultiGROUP BY142
Virtual Columns142
11.Other File Formats and Compression145
Determining Installed Codecs145
Choosing a Compression Codec146
Enabling Intermediate Compression147
Final Output Compression148
Sequence Files148
Compression in Action149
Archive Partition152
Compression:Wrapping Up154
12.Developing155
Changing Log4J Properties155
Connecting a Java Debugger to Hive156
Building Hive from Source156
Running Hive Test Cases156
Execution Hooks158
Setting Up Hive and Eclipse158
Hive in a Maven Project158
Unit Testing in Hive with hive_test159
The New Plugin Developer Kit161
13.Functions163
Discovering and Describing Functions163
Calling Functions164
Standard Functions164
Aggregate Functions164
Table Generating Functions165
A UDF for Finding a Zodiac Sign from a Day166
UDF Versus GenericUDF169
Permanent Functions171
User-Defined Aggregate Functions172
Creating a COLLECT UDAF to Emulate GROUP_CONCAT172
User-Defined Table Generating Functions177
UDTFs that Produce Multiple Rows177
UDTFs that Produce a Single Row with Multiple Columns179
UDTFs that Simulate Complex Types179
Accessing the Distributed Cache from a UDF182
Annotations for Use with Functions184
Deterministic184
Stateful184
DistinctLike185
Macros185
14.Streaming187
Identity Transformation188
Changing Types188
Projecting Transformation188
Manipulative Transformations189
Using the Distributed Cache189
Producing Multiple Rows from a Single Row190
Calculating Aggregates with Streaming191
CLUSTER BY,DISTRIBUTE BY,SORT BY192
GenericMR Tools for Streaming to Java194
Calculating Cogroups196
15.Customizing Hive File and Record Formats199
File Versus Record Formats199
Demystifying CREATE TABLE Statements199
File Formats201
Sequence File201
RCFile202
Example of a Custom Input Format:DualInputFormat203
Record Formats:SerDes205
CSV and TSV SerDes206
ObjectInspector206
Think Big Hive Reflection ObjectInspector206
XMLUDF207
XPath-Related Functions207
JSON SerDe208
Avro Hive SerDe209
Defining Avro Schema Using Table Properties209
Defining a Schema from a URI210
Evolving Schema211
Binary Output211
16.Hive Thrift Service213
Starting the Thrift Server213
Setting Up Groovy to Connect to HiveService214
Connecting to HiveServer214
Getting Cluster Status215
Result Set Schema215
Fetching Results215
Retrieving Query Plan216
Metastore Methods216
Example Table Checker216
Administrating HiveServer217
Productionizing HiveService217
Cleanup218
Hive ThriftMetastore219
ThriftMetastore Configuration219
Client Configuration219
17.Storage Handlers and NoSQL221
Storage Handler Background221
HiveStorageHandler222
HBase222
Cassandra224
Static Column Mapping224
Transposed Column Mapping for Dynamic Columns224
Cassandra SerDe Properties224
DynamoDB225
18.Security227
Integration with Hadoop Security228
Authentication with Hive228
Authorization in Hive229
Users,Groups,and Roles230
Privileges to Grant and Revoke231
Partition-Level Privileges233
Automatic Grants233
19.Locking235
Locking Support in Hive with Zookeeper235
Explicit,Exclusive Locks238
20.Hive Integration with Oozie239
Oozie Actions239
Hive Thrift Service Action240
A Two-Query Workflow240
Oozie Web Console242
Variables in Workflows242
Capturing Output243
Capturing Output to Variables243
21.Hive and Amazon Web Services(AWS)245
Why Elastic MapReduce?245
Instances245
Before Yon Start246
Managing Your EMR Hive Cluster246
Thrift Serveron EMR Hive247
Instance Groups on EMR247
Configuring Your EMR Cluster248
Deploying hive-site.xml248
Deploying a.hiverc Script249
Setting Up a Memory-Intensive Contiguration249
Persistence and the Metastore on EMR250
HDFS and S3 on EMR Cluster251
Putting Resources,Configs,and Bootstrap Scripts on S3252
Logs on S3252
Spot Instances252
Security Groups253
EMR Versus EC2 and Apache Hive254
Wrapping Up254
22.HCatalog255
Introduction255
MapReduce256
Reading Data256
Writing Data258
Command Line261
Security Model261
Architecture262
23.Case Studies265
m6d.com(Media6Degrees)265
Data Science at M6D Using Hive and R265
M6D UDF Pseudorank270
M6D Managing Hive Data Across Multiple MapReduce Clusters274
Outbrain278
In-Site Referrer Identification278
Counting Uniques280
Sessionization282
NASA's Jet Propulsion Laboratory287
The Regional Climate Model Evaluation System287
Our Experience:Why Hive?290
Some Challenges and How We Overcame Them291
Photobucket292
Big Data at Photobucket292
What Hardware Do We Use for Hive?293
What's in Hive?293
Who Does It Support?293
SimpleReach294
Experiences and Needs from the Customer Trenches296
A Karmasphere Perspective296
Introduction296
Use Case Examples from the Customer Trenches297
Glossary305
Appendix:References309
Index313