图书介绍
并行程序设计:C、MPI与OpenMP英文PDF|Epub|txt|kindle电子书版本网盘下载
- (美)奎因著 著
- 出版社: 北京:清华大学出版社
- ISBN:730211157X
- 出版时间:2005
- 标注页数:519页
- 文件大小:24MB
- 文件页数:535页
- 主题词:并行程序-程序设计-高等学校-教材-英文
PDF下载
下载说明
并行程序设计:C、MPI与OpenMP英文PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
CHAPTER 1 Motivation and History1
1.1 Introduction1
1.2 Modern Scientific Method3
1.3 Evolution of Supercomputing4
1.4 Modern Parallel Computers5
1.4.1 The Cosmic Cube6
1.4.2 Commercial Parallel Computers6
1.4.3 Beowulf7
1.4.4 Advanced Strategic Computing Initiative8
1.5 Seeking Concurrency9
1.5.1 Data Dependence Graphs9
1.5.2 Data Parallelism10
1.5.3 Functional Parallelism10
1.5.4 Pipelining12
1.5.5 Size Considerations13
1.6 Data Clustering14
CONTENTS14
Preface14
1.7 Programming Parallel Computers17
1.7.1 Extend a Compiler17
1.7.2 Extend a Sequential Programming Language18
1.7.3 Add a Parallel Programming Layer19
1.7.4 Create a Parallel Language19
1.7.5 Current Status21
1.8 Summary21
1.9 KeyTerms22
1.10 Bibliographic Notes22
1.11 Exercises23
CHAPTER 2 Parallel Architectures27
2.1 Introduction27
2.2 Interconnection Networks28
2.2.1 Shared versus Switched Media28
2.2.2 Switch Network Topologies29
2.2.3 2-D Mesh Network29
2.2.4 Binary Tree Network30
2.2.5 Hypertree Network31
2.2.6 Butterfty Network32
2.2.7 Hypercube Network33
2.2.8 Shuffle-exchange Network35
2.2.9 Summary36
2.3.1 Architecture and Data-parallel Operations37
2.3 Processor Arrays37
2.3.2 Processor Array Performance39
2.3.3 Processor Interconnection Network40
2.3.4 Enabling and Disabling Processors40
2.3.5 Additional Architectural Features42
2.3.6 Shortcomings of Processor Arrays42
2.4 Multiprocessors43
2.4.1 Centralized Multiprocessors43
2.4.2 Distributed Multiprocessors45
2.5 Multicomputers49
2.5.1 Asymmetrical Multicomputers49
2.5.2 Symmetrical Multicomputers51
2.5.3 Which Model Is Best for a Commodity Cluster?52
2.5.4 Differences between Clusters and Networks of Workstations53
2.6.1 SISD54
2.6 Flynn's Taxonomy54
2.6.2 SIMD55
2.6.3 MISD55
2.6.4 MIMD56
2.7 Summary58
2.8 Key Terms59
2.9 Bibliographic Notes59
2.10 Exercises60
CHAPTER 3 Parallel Algorithm Design63
3.1 Introduction63
3.2 The Task/Channel Model63
3.3 Foster's Design Methodology64
3.3.1 Partitioning65
3.3.2 Communication67
3.3.3 Agglomeration68
3.3.4 Mapping70
3.4 Boundary Value Problem73
3.4.1 Introduction73
3.4.2 Partitioning75
3.4.3 Communication75
3.4.4 Agglomeration and Mapping76
3.4.5 Analysis76
3.5 Finding the Maximum77
3.5.1 Introduction77
3.5.2 Partitioning77
3.5.3 Communication77
3.5.4 Agglomeration and Mapping81
3.6.1 Introduction82
3.5.5 Analysis82
3.6 The n-Body Problem82
3.6.2 Partitioning83
3.6.3 Communication83
3.6.4 Agglomeration and Mapping85
3.6.5 Analysis85
3.7 Adding Data Input86
3.7.1 Introduction86
3.7.2 Communication87
3.7.3 Analysis88
3.8 Summary89
3.10 Bibliographic Notes90
3.11 Exercises90
3.9 Key Terms90
CHAPTER 4 Message-Passing Programming93
4.1 Introduction93
4.2 The Message-Passing Model94
4.3 The Message-Passing Interface95
4.4 Circuit Satisfiability96
4.4.1 Function MPI_Init99
4.4.2 Functions MPI_Comm_rank and MPI_Comm_size99
4.4.3 Function MPI_Finalize101
4.4.4 Compiling MPI Programs102
4.4.5 Running MPI Programs102
4.5 Introducing Collective Communication104
4.5.1 Function MPI_Reduce105
4.6.2 Function MPI_Barrier108
4.6 Benchmarking Parallel Performance108
4.6.1 Functions MPI_Wtime and MPI_Wtick108
4.7 Summary110
4.8 Key Terms110
4.9 Bibliographic Notes110
4.10 Exercises111
CHAPTER 5 The Sieve of Eratosthenes115
5.1 Introduction115
5.2 Sequential Algorithm115
5.3 Sources of Parallelism117
5.4 Data Decomposition Options117
5.4.1 Interleaved Data Decomposition118
5.4.2 Block Data Decomposition118
5.4.4 Local Index versus Global Index120
5.4.3 Block Decomposition Macros120
5.4.5 Ramifications of Block Decomposition121
5.5 Developing the Parallel Algorithm121
5.5.1 Function MPI_Bcast122
5.6 Analysis of Parallel Sieve Algorithm122
5.7 Documenting the Parallel Program123
5.8 Benchmarking128
5.9 Improvements129
5.9.1 Delete Even Integers129
5.9.2 Eliminate Broadcast130
5.9.3 Reorganize Loops131
5.9.4 Benchmarking131
5.10 Summary133
5.13 Exercises134
5.11 Key Terms134
5.12 Bibliographic Notes134
CHAPTER 6 Floyd's Algorithm137
6.1 Introduction137
6.2 The All-Pairs Shortest-Path Problem137
6.3 Creating Arrays at Run Time139
6.4 Designing the Parallel Algorithm140
6.4.1 Partitioning140
6.4.2 Communication141
6.4.3 Agglomeration and Mapping142
6.4.4 Matrix Input/Output143
6.5 Point-to-Point Communication145
6.5.1 Function MPI_Send146
6.5.2 Function MPI_Recv147
6.5.3 Deadlock148
6.6 Documenting the Parallel Program149
6.7 Analysis and Benchmarking151
6.8 Summary154
6.9 KeyTerms154
6.10 Bibliographic Notes154
6.11 Exercises154
CHAPTER 7 Performance Analysis159
7.1 Introduction159
7.2 Speedup and Efficiency159
7.3 Amdahl's Law161
7.3.2 The Amdahl Effect164
7.4 Gustafson-Barsis's Law164
7.3.1 Limitations of Amdahl's Law164
7.5 The Karp-Flatt Metric167
7.6 The Isoefficiency Metric170
7.7 Summary174
7.8 Key Terms175
7.9 Bibliographic Notes175
7.10 Exercises176
CHAPTER 8 Matrix-Vector Multiplication178
8.1 Introduction178
8.2 Sequential Algorithm179
8.3 Data Decomposition Options180
8.4 Rowwise Block-Striped Decomposition181
8.4.1 Design and Analysis181
8.4.2 Replicating a Block-Mapped Vector183
8.4.3 Function MPI_Allgatherv184
8.4.4 Replicated Vector Input/Output186
8.4.5 Documenting the Parallel Program187
8.4.6 Benchmarking187
8.5 Columnwise Block-Striped Decomposition189
8.5.1 Design and Analysis189
8.5.2 Reading a Columnwise Block-Striped Matrix191
8.5.3 Function MPI_Scatterv191
8.5.4 Printing a Columnwise Block-Striped Matrix193
8.5.5 Function MPI_Gatherv193
8.5.6 Distributing Partial Results195
8.5.7 Function MPI_Alltoallv195
8.5.8 Documenting the Parallel Program196
8.5.9 Benchmarking198
8.6.1 Design and Analysis199
8.6 Checkerboard Block Decomposition199
8.6.2 Creating a Communicator202
8.6.3 Function MPI_Dims_create203
8.6.4 Function MPI_Cart_create204
8.6.5 Reading a Checkerboard Matrix205
8.6.6 Function MPI_Cart_rank205
8.6.7 Function MPI_Cart_coords207
8.6.8 Function MPI_Comm_split207
8.6.9 Benchmarking208
8.7 Summary210
8.9 Bibliographic Notes211
8.10 Exercises211
8.8 Key Terms211
CHAPTER 9 Document Classlflcation216
9.1 Introduction216
9.2 Parallel Algorithm Design217
9.2.1 Partitioning and Communication217
9.2.2 Agglomeration and Mapping217
9.2.3 Manager/Worker Paradigm218
9.2.4 Manager Process219
9.2.5 Function MPI_Abort220
9.2.6 Worker Process221
9.2.7 Creating a Workers-only Communicator223
9.3 Nonblocking Communications223
9.3.1 Manager's Communication224
9.3.2 Function MPI_Irecv224
9.3.6 Function MPI_Probe225
9.3.5 Function MPI_Isend225
9.3.3 Function MPI_Wait225
9.3.4 Workers'Communications225
9.3.7 Function MPI_Get_count226
9.4 Documenting the Parallel Program226
9.5 Enhancements232
9.5.1 Assigning Groups of Documents232
9.5.2 Pipelining232
9.5.3 Function MPI_Testsome234
9.6 Summary235
9.7 Key Terms236
9.8 Bibliographic Notes236
9.9 Exercises236
10.1 Introduction239
CHAPTER 10 Monte Carlo Methods239
10.1.1 Why Monte Carlo Work240
10.1.2 Monte Carlo and Parallel Computing243
10.2 Sequential Random Number Generators243
10.2.1 Linear Congruential244
10.2.2 Lagged Fibonacci245
10.3 Parallel Random Number Generators245
10.3.1 Manager-Worker Method246
10.3.2 Leapfrog Method246
10.3.3 Sequence Splitting247
10.3.4 Parameterization248
10.4 Other Random Number Distributions248
10.4.1 Inverse Cumulative Distribution Function Transformation249
10.4.2 Box-Muller Transformation250
10.4.3 The Rejection Method251
10.5.1 Neutron Transport253
10.5 Case Studies253
10.5.2 Temperature at a Point Inside a 2-D Plate255
10.5.3 Two-Dimensional Ising Model257
10.5.4 Room Assignment Problem259
10.5.5 Parking Garage262
10.5.6 Traffic Circle264
10.6 Summary268
10.7 Key Terms269
10.8 Bibliographic Notes269
10.9 Exercises270
CHAPTER 11 Matrix Multipllcation273
11.1 Introduction273
11.2.1 Iterative,Row-Oriented Algorithm274
11.2 Sequential Matrix Multiplication274
11.2.2 Recursive,Block-Oriented Algorithm275
11.3 Rowwise Block-Striped Parallel Algorithm277
11.3.1 Identifying Primitive Tasks277
11.3.2 Agglomeration278
11.3.3 Communication and Further Agglomeration279
11.3.4 Analysis279
11.4 Cannon's Algorithm281
11.4.1 Agglomeration281
11.4.2 Communication283
11.4.3 Analysis284
11.5 Summary286
11.8 Exercises287
11.7 Bibliographic Notes287
11.6 Key Terms287
CHAPTER 12 Solving Linear Systems290
12.1 Introduction290
12.2 Terminology291
12.3 Back Substitution292
12.3.1 Sequential Algorithm292
12.3.2 Row-Oriented Parallel Algorithm293
12.3.3 Column-Oriented Parallel Algorithm295
12.3.4 Comparison295
12.4 Gaussian Elimination296
12.4.1 Sequential Algorithm296
12.4.2 Parallel Algorithms298
12.4.3 Row-Oriented Algorithm299
12.4.5 Comparison303
12.4.4 Column-Oriented Algorithm303
12.4.6 Pipelined,Row-Oriented Algorithm304
12.5 Iterative Methods306
12.6 The Conjugate Gradient Method309
12.6.1 Sequential Algorithm309
12.6.2 Parallel Implementation310
12.7 Summary313
12.8 Key Terms314
12.9 Bibliographic Notes314
12.10 Exercises314
CHAPTER 13 Finite Difference Methods318
13.1 Introduction318
13.2.1 Categorizing PDEs320
13.2 Partial Differential Equations320
13.2.2 Difference Quotients321
13.3 Vibrating String322
13.3.1 Deriving Equations322
13.3.2 Deriving the Sequential Program323
13.3.3 Parallel Program Design324
13.3.4 Isoefficiency Analysis327
13.3.5 Replicating Computations327
13.4 Steady-State Heat Distribution329
13.4.1 Deriving Equations329
13.4.2 Deriving the Sequential Program330
13.4.3 Parallel Program Design332
13.4.4 Isoefficiency Analysis332
13.5 Summary334
13.4.5 Implementation Details334
13.6 Key Terms335
13.7 Bibliographic Notes335
13.8 Exercises335
CHAPTER 14 Sorting338
14.1 Introduction338
14.2 Quicksort339
14.3 A Parallel Quicksort Algorithm340
14.3.1 Definition of Sorted340
14.3.2 Algorithm Development341
14.3.3 Analysis341
14.4 Hyperquicksort343
14.4.1 Algorithm Description343
14.4.2 lsoefficiency Analysis345
14.5 Parallel Sorting by Regular Sampling346
14.5.1 Algorithm Description346
14.5.2 Isoefficiency Analsis347
14.6 Summary349
14.7 Key Terms349
14.8 Bibliographic Notes350
14.9 Exercises350
CHAPTER 15 The Fast Fourler Transform353
15.1 Introduction353
15.2 Fourier Analysis353
15.3 The Discrete Fourier Transform355
15.3.2 Sample Application:Polynomial Multiplication357
15.3.1 Inverse Discrete Fourier Transform357
15.4 The Fast Fourier Transform360
15.5 Parallel Program Design363
15.5.1 Partitioning and Communication363
15.5.2 Agglomeration and Mapping365
15.5.3 Isoefficiency Analysis365
15.6 Summary367
15.7 Key Terms367
15.8 Bibliographic Notes367
15.9 Exercises367
CHAPTER 16 Comblnatorlal Search369
16.1 Introduction369
16.2 Divide and Conquer370
16.3.1 Example371
16.3 Backtrack Search371
16.3.2 Time and Space Complexity374
16.4 Parallel Backtrack Search374
16.5 Distributed Termination Detection377
16.6 Branch and Bound380
16.6.1 Example380
16.6.2 Sequential Algorithm382
16.6.3 Analysis385
16.7 Parallel Branch and Bound385
16.7.1 Storing and Sharing Unexamined Subproblems386
16.7.2 Efficiency387
16.7.3 Halting Conditions387
16.8.1 Minimax Algorithm388
16.8 Searching Game Trees388
16.8.2 Alpha-Beta Pruning392
16.8.3 Enhancements to Alpha-Beta Pruning395
16.9 Parallel Alpha-Beta Search395
16.9.1 Parallel Aspiration Search396
16.9.2 Parallel Subtree Evaluation396
16.9.3 Distributed Tree Search397
16.10 Summary399
16.11 Key Terms400
16.12 Bibliographic Notes400
16.13 Exercises401
CHAPTER 17 Shared-Memory Programming404
17.1 Introduction404
17.2 The Shared-Memory Model405
17.3 Parallel for Loops407
17.3.1 parallel for Pragma408
17.3.2 Function omp_get_ num_procs410
17.3.3 Function omp_set_ num_threads410
17.4 Declaring Private Variables410
17.4.1 private Clause411
17.4.2 firstprivate Clause412
17.4.3 lastprivate Clause412
17.5 Critical Sections413
17.5.1 critical Pragma415
17.6 Reductions415
17.7 Performance Improvements417
17.7.1 Inverting Loops417
17.7.2 Conditionally Executing Loops418
17.7.3 Scheduling Loops419
17.8 More General Data Parallelism421
17.8.1 parallel Pragma422
17.8.2 Function omp_get_ thread_num423
17.8.3 Function omp_get_ num_threads425
17.8.4 for Pragma425
17.8.5 single Pragma427
17.8.6 nowait Clause427
17.9 Functional Parallelism428
17.9.1 parallel sections Pragma429
17.9.2 section Pragma429
17.9.3 sections Pragma429
17.10 Summary430
17.11 Key Terms432
17.12 Bibliographic Notes432
17.13 Exercises433
CHAPTER 18 Combining MPI and OpenMP436
18.1 Introduction436
18.2 Conjugate Gradient Method438
18.2.1 MPI Program438
18.2.2 Functional Profiling442
18.2.3 Parallelizing Function matrix_vector_product442
18.2.4 Benchmarking443
18.3 Jacobi Method444
18.3.1 Profiling MPI Program444
18.3.2 Parallelizing Function find_steady_state444
18.3.3 Benchmarking446
18.4 Summary448
18.5 Exercises448
APPENDIX A MPI Functions450
APPENDIX B Utility Functions485
B.1 Header File MyMPI.h485
B.2 Source File MyMPI.c486
APPENDIX C Debugging MPI Programs505
C.1 Introduction505
C.2 Typical Bugs in MPI Programs505
C.2.1 Bugs Resulting in Deadlock505
C.2.2 Bugs Resulting in Incorrect Results506
C.2.3 Advantages of Collective Communications507
C.3 Practical Debugging Strategies507
APPENDIX D Review of Complex Numbers509
APPENDIX E OpenMP Functions513
Bibliography515