图书介绍
嵌入式计算:体系结构、编译器和工具的VLIN方法PDF|Epub|txt|kindle电子书版本网盘下载
![嵌入式计算:体系结构、编译器和工具的VLIN方法](https://www.shukui.net/cover/15/30978948.jpg)
- (美)费希尔(Fisher,J.A.)等著 著
- 出版社: 北京:机械工业出版社
- ISBN:7111197712
- 出版时间:2006
- 标注页数:671页
- 文件大小:377MB
- 文件页数:708页
- 主题词:微型计算机-系统设计-英文
PDF下载
下载说明
嵌入式计算:体系结构、编译器和工具的VLIN方法PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
CHAPTER 1 An Introduction to Embedded Processing1
1.1 What Is Embedded Computing?3
1.1.1 Attributes of Embedded Devices4
1.1.2 Embedded Is Growing5
1.2 Distinguishing Between Embedded and General-Purpose Computing6
1.2.1 The "Run One Program Only" Phenomenon8
1.2.2 Backward and Binary Compatibility9
1.2.3 Physical Limits in the Embedded Domain10
1.3 Characterizing Embedded Computing11
1.3.1 Categorization by Type of Processing Engine12
Digital Signal Processors13
Network Processors16
1.3.2 Categorization by Application Area17
The Image Processing and Consumer Market18
The Communications Market20
The Automotive Market22
1.3.3 Categorization by Workload Differences22
1.4 Embedded Market Structure23
1.4.1 The Market for Embedded Processor Cores24
1.4.2 Business Model of Embedded Processors25
1.4.3 Costs and Product Volume26
1.4.4 Software and the Embedded Software Market28
1.4.5 Industry Standards28
1.4.6 Product Life Cycle30
1.4.7 The Transition to SoC Design31
Effects of SoC on the Business Model34
Centers of Embedded Design35
1.4.8 The Future of Embedded Systems36
Connectivity:Always-on Infrastructure36
State:Personal Storage36
Administration37
Security37
The Next Generation37
1.5 Further Reading38
1.6 Exercises40
CHAPTER 2 An Overview of VLIW and ILP45
2.1 Semantics and Parallelism46
2.1.1 Baseline:Sequential Program Semantics46
2.1.2 Pipelined Execution,Overlapped Execution,and Multiple Execution Units47
2.1.3 Dependence and Program Rearrangement51
2.1.4 ILP and Other Forms of Parallelism52
2.2 Design Philosophies54
2.2.1 An Illustration of Design Philosophies:RISC Versus CISC56
2.2.2 First Definition of VLIW57
2.2.3 A Design Philosophy:VLIW59
VLIW Versus Superscalar59
VLIW Versus DSP62
2.3 Role of the Compiler63
2.3.1 The Phases of a High-Performance Compiler63
2.3.2 Compiling for ILP and VLIW65
2.4 VLIW in the Embedded and DSP Domains69
2.5 Historical Perspective and Further Reading71
2.5.1 ILP Hardware in the 1960s and 1970s71
Early Supercomputer Arithmetic Units71
Attached Signal Processors72
Horizontal Microcode72
2.5.2 The Development of ILP Code Generation in the 1980s73
Acyclic Microcode Compaction Techniques73
Cyclic Techniques:Software Pipelining75
2.5.3 VLIW Development in the 1980s76
2.5.4 ILP in the 1990s and 2000s77
2.6 Exercises78
CHAPTER 3 An Overview of ISA Design83
3.1 Overview:What to Hide84
3.1.1 Architectural State:Memory and Registers84
3.1.2 Pipelining and Operational Latency85
3.1.3 Multiple Issue and Hazards86
Exposing Dependence and Independence86
Structural Hazards87
Resource Hazards89
3.1.4 Exception and Interrupt Handling89
3.1.5 Discussion90
3.2 Basic VLIW Design Principles91
3.2.1 Implications for Compilers and Implementations92
3.2.2 Execution Model Subtleties93
3.3 Designing a VLIW ISA for Embedded Systems95
3.3.1 Application Domain96
3.3.2 ILP Style98
3.3.3 Hardware/Software Tradeoffs100
3.4 Instruction-set Encoding101
3.4.1 A Larger Definition of Architecture101
3.4.2 Encoding and Architectural Style105
RISC Encodings107
CISC Encodings108
VLIW Encodings109
Why Not Superscalar Encodings?109
DSP Encodings110
Vector Encodings111
3.5 VLIW Encoding112
3.5.1 Operation Encoding113
3.5.2 Instruction Encoding113
Fixed-overhead Encoding115
Distributed Encoding115
Template-based Encoding116
3.5.3 Dispatching and Opcode Subspaces117
3.6 Encoding and Instruction-set Extensions119
3.7 Further Reading121
3.8 Exercises121
CHAPTER 4 Architectural Structures in ISA Design125
4.1 The Datapath127
4.1.1 Location of Operands and Results127
4.1.2 Datapath Width127
4.1.3 Operation Repertoire129
Simple Integer and Compare Operations131
Carry,Overflow,and Other Flags131
Common Bitwise Utilities132
Integer Multiplication132
Fixed-point Multiplication133
Integer Division135
Floating-point Operations136
Saturated Arithmetic137
4.1.4 Micro-SIMD Operations139
Alignment Issues141
Precision Issues141
Dealing with Control Flow142
Pack,Unpack,and Mix143
Reductions143
4.1.5 Constants144
4.2 Registers and Clusters144
4.2.1 Clustering145
Architecturally Invisible Clustering147
Architecturally Visible Clustering147
4.2.2 Heterogeneous Register Files149
4.2.3 Address and Data Registers149
4.2.4 Special Register File Features150
Indexed Register Files150
Rotating Register Files151
4.3 Memory Architecture151
4.3.1 Addressing Modes152
4.3.2 Access Sizes153
4.3.3 Alignment Issues153
4.3.4 Caches and Local Memories154
Prefetching154
Local Memories and Lockable Caches156
4.3.5 Exotic Addressing Modes for Embedded Processing156
4.4 Branch Architecture156
4.4.1 Unbundling Branches158
Two-step Branching159
Three-step Branching159
4.4.2 Multiway Branches160
4.4.3 Multicluster Branches161
4.4.4 Branches and Loops162
4.5 Speculation and Predication163
4.5.1 Speculation163
Control Speculation164
Data Speculation167
4.5.2 Predication168
Full Predication169
Partial Predication170
Cost and Benefits of Predication171
Predication in the Embedded Domain172
4.6 System Operations173
4.7 Further Reading174
4.8 Exercises175
CHAPTER 5 Microarchitecture Design179
5.1 Register File Design182
5.1.1 Register File Structure182
5.1.2 Register Files,Technology,and Clustering183
5.1.3 Separate Address and Data Register Files184
5.1.4 Special Registers and Register File Features186
5.2 Pipeline Design186
5.2.1 Balancing a Pipeline187
5.3 VLIW Fetch,Sequencing,and Decoding191
5.3.1 Instruction Fetch191
5.3.2 Alignment and Instruction Length192
5.3.3 Decoding and Dispersal194
5.3.4 Decoding and ISA Extensions195
5.4 The Datapath195
5.4.1 Execution Units197
5.4.2 Bypassing and Forwarding Logic200
5.4.3 Exposing Latencies202
5.4.4 Predication and Selects204
5.5 Memory Architecture206
5.5.1 Local Memory and Caches206
5.5.2 Byte Manipulation209
5.5.3 Addressing,Protection,and Virtual Memory210
5.5.4 Memories in Multiprocessor Systems211
5.5.5 Memory Speculation213
5.6 The Control Unit214
5.6.1 Branch Architecture214
5.6.2 Predication and Selects215
5.6.3 Interrupts and Exceptions216
5.6.4 Exceptions and Pipelining218
Drain and Flush Pipeline Models218
Early Commit219
Delayed Commit220
5.7 Control Registers221
5.8 Power Considerations221
5.8.1 Energy Efficiency and ILP222
System-level Power Considerations224
5.9 Further Reading225
5.10 Exercises227
CHAPTER 6 System Design and Simulation231
6.1 System-on-a-Chip(SoC)231
6.1.1 IP Blocks and Design Reuse232
A Concrete SoC Example233
Virtual Components and the VSIA Alliance235
6.1.2 Design Flows236
Creation Flow236
Verification Flow238
6.1.3 SoC Buses239
Data Widths240
Masters,Slaves,and Arbiters241
Bus Transactions242
Test Modes244
6.2 Processor Cores and SoC245
6.2.1 Nonprogrammable Accelerators246
Reconfigurable Logic248
6.2.2 Multiprocessing on a Chip250
Symmetric Multiprocessing250
Heterogeneous Multiprocessing251
Example:A Multicore Platform for Mobile Multimedia252
6.3 Overview of Simulation254
6.3.1 Using Simulators256
6.4 Simulating a VLIW Architecture257
6.4.1 Interpretation258
6.4.2 Compiled Simulation259
Memory262
Registers263
Control Flow263
Exceptions266
Analysis of Compiled Simulation267
Performance Measurement and Compiled Simulation268
6.4.3 Dynamic Binary Translation268
6.4.4 Trace-driven Simulation270
6.5 System Simulation271
6.5.1 I/O and Concurrent Activities272
6.5.2 Hardware Simulation272
Discrete Event Simulation274
6.5.3 Accelerating Simulation275
In-Circuit Emulation275
Hardware Accelerators for Simulation276
6.6 Validation and Verification276
6.6.1 Co-simulation278
6.6.2 Simulation,Verification,and Test279
Formal Verification280
Design for Testability280
Debugging Support for SoC281
6.7 Further Reading282
6.8 Exercises284
CHAPTER 7 Embedded Compiling and Toolchains287
7.1 What Is Important in an ILP Compiler?287
7.2 Embedded Cross-Developmant Toolchains290
7.2.1 Compiler291
7.2.2 Assembler292
7.2.3 Libraries294
7.2.4 Linker296
7.2.5 Post-link Optimizer297
7.2.6 Run-time Program Loader297
7.2.7 Simulator299
7.2.8 Debuggers and Monitor ROMs300
7.2.9 Automated Test Systems301
7.2.10 Profiling Tools302
7.2.11 Binary Utilities302
7.3 Structure of an ILP Compiler302
7.3.1 Front End304
7.3.2 Machine-independent Optimizer304
7.3.3 Back End:Machine-specific Optimizations306
7.4 Code Layout306
7.4.1 Code Layout Techniques306
DAG-based Placement308
The "Pettis-Hansen" Technique310
Procedure Inlining310
Cache Line Coloring311
Temporal-order Placement311
7.5 Embedded-Specific Tradeoffs for Compilers311
7.5.1 Space,Time,and Energy Tradeoffs312
7.5.2 Power-specific Optimizations315
Fundamentals of Power Dissipation316
Power-aware Software Techniques317
7.6 DSP-Specific Compiler Optimizations320
7.6.1 Compiler-visible Features of DSPs322
Heterogeneous Registers322
Addressing Modes322
Limited Connectivity323
Local Memories323
Harvard Architecture324
7.6.2 Instruction Selection and Scheduling325
7.6.3 Address Computation and Offset Assignment327
7.6.4 Local Memories327
7.6.5 Register Assignment Techniques328
7.6.6 Retargetable DSP and ASIP Compilers329
7.7 Further Reading332
7.8 Exercises333
CHAPTER 8 Compiling for VLIWs and ILP337
8.1 Profiling338
8.1.1 Types of Profiles338
8.1.2 Profile Collection341
8.1.3 Synthetic Profiles(Heuristics in Lieu of Profiles)341
8.1.4 Profile Bookkeeping and Methodology342
8.1.5 Profiles and Embedded Applications342
8.2 Scheduling343
8.2.1 Acyclic Region Types and Shapes345
Basic Blocks345
Traces345
Superblocks345
Hyperblocks347
Treegions347
Percolation Scheduling348
8.2.2 Region Formation350
Region Selection351
Enlargement Techniques353
Phase-ordering Considerations356
8.2.3 Schedule Construction357
Analyzing Programs for Schedule Construction359
Compaction Techniques362
Compensation Code365
Another View of Scheduling Problems367
8.2.4 Resource Management During Scheduling368
Resource Vectors368
Finite-state Automata369
8.2.5 Loop Scheduling371
Modulo Scheduling373
8.2.6 Clustering380
8.3 Register Allocation382
8.3.1 Phase-ordering Issues383
Register Allocation and Scheduling383
8.4 Speculation and Predication385
8.4.1 Control and Data Speculation385
8.4.2 Predicated Execution386
8.4.3 Prefetching389
8.4.4 Data Layout Methods390
8.4.5 Static and Hybrid Branch Prediction390
8.5 Instruction Selection390
8.6 Further Reading391
8.7 Exercises395
CHAPTER 9 The Run-time System399
9.1 Exceptions,Interrupts,and Traps400
9.1.1 Exception Handling400
9.2 Application Binary Interface Considerations402
9.2.1 Loading Programs404
9.2.2 Data Layout406
9.2.3 Accessing Global Data407
9.2.4 Calling Conventions409
Registers409
Call Instructions409
Call Sites410
Function Prologues and Epilogues412
9.2.5 Advanced ABI Topics412
Variable-length Argument Lists412
Dynamic Stack Allocation413
Garbage Collection414
Linguistic Exceptions414
9.3 Code Compression415
9.3.1 Motivations416
9.3.2 Compression and Information Theory417
9.3.3 Architectural Compression Options417
Decompression on Fetch420
Decompression on Refill420
Load-time Decompression420
9.3.4 Compression Methods420
Hand-tuned ISAs421
Ad Hoc Compression Schemes421
RAM Decompression422
Dictionary-based Software Compression422
Cache-based Compression422
Quantifying Compression Benefits424
9.4 Embedded Operating Systems427
9.4.1 "Traditional" OS Issues Revisited427
9.4.2 Real-time Systems428
Real-time Scheduling429
9.4.3 Multiple Flows of Control431
Threads,Processes,and Microkernels432
9.4.4 Market Considerations433
Embedded Linux435
9.4.5 Downloadable Code and Virtual Machines436
9.5 Multiprocessing and Multithreading438
9.5.1 Multiprocessing in the Embedded World438
9.5.2 Multiprocessing and VLIW439
9.6 Further Reading440
9.7 Exercises441
CHAPTER 10 Application Design and Customization443
10.1 Programming Language Choices443
10.1.1 Overview of Embedded Programming Languages444
10.1.2 Traditional C and ANSI C445
10.1.3 C++ and Embedded C++447
Embedded C++449
10.1.4 Matlab450
10.1.5 Embedded Java452
The Allure of Embedded Java452
Embedded Java:The Dark Side455
10.1.6 C Extensions for Digital Signal Processing456
Restricted Pointers456
Fixed-point Data Types459
Circular Arrays461
Matrix Referencing and Operators462
10.1.7 Pragmas,Intrinsics,and Inline Assembly Language Code462
Compiler Pragmas and Type Annotations462
Assembler Inserts and Intrinsics463
10.2 Performance,Benchmarking,and Tuning465
10.2.1 Importance and Methodology465
10.2.2 Tuning an Application for Performance466
Profiling466
Performance Tuning and Compilers467
Developing for ILP Targets468
10.2.3 Benchmarking473
10.3 Scalability and Customizability475
10.3.1 Scalability and Architecture Families476
10.3.2 Exploration and Scalability477
10.3.3 Customization478
Customized Implementations479
10.3.4 Reconfigurable Hardware480
Using Programmable Logic480
10.3.5 Customizable Processors and Tools481
Describing Processors481
10.3.6 Tools for Customization483
Customizable Compilers485
10.3.7 Architecture Exploration487
Dealing with the Complexity488
Other Barriers to Customization488
Wrapping Up489
10.4 Further Reading489
10.5 Exercises490
CHAPTER 11 Application Areas493
11.1 Digital Printing and Imaging493
11.1.1 Photo Printing Pipeline495
JPEG Decompression495
Scaling496
Color Space Conversion497
Dithering499
11.1.2 Implementation and Performance501
Summary505
11.2 Telecom Applications505
11.2.1 Voice Coding506
Waveform Codecs506
Vocoders507
Hybrid Coders508
11.2.2 Multiplexing509
11.2.3 The GSM Enhanced Full-rate Codec510
Implementation and Performance510
11.3 Other Application Areas514
11.3.1 Digital Video515
MPEG-1and MPEG-2516
MPEG-4518
11.3.2 Automotive518
Fail-safety and Fault Tolerance519
Engine Control Units520
In-vehicle Networking520
11.3.3 Hard Disk Drives522
Motor Control524
Data Decoding525
Disk Scheduling and On-disk Management Tasks526
Disk Scheduling and Off-disk Management Tasks527
11.3.4 Networking and Network Processors528
Network Processors531
11.4 Further Reading535
11.5 Exercises537
APPENDIX A The VEX System539
A.1 The VEX Instruction-set Architecture540
A.1.1 VEX Assembly Language Notation541
A.1.2 Clusters542
A.1.3 Execution Model544
A.1.4 Architecture State545
A.1.5 Arithmetic and Logic Operations545
Examples547
A.1.6 Intercluster Communication549
A.1.7 Memory Operations550
A.1.8 Control Operations552
Examples553
A.1.9 Structure of the Default VEX Cluster554
Register Files and Immediates555
A.1.10 VEX Semantics556
A.2 The VEX Run-time Architecture558
A.2.1 Data Allocation and Layout559
A.2.2 Register Usage560
A.2.3 Stack Layout and Procedure Linkage560
Procedure Linkage563
A.3 The VEX C Compiler566
A.3.1 Command Line Options568
Output Files569
Preprocessing570
Optimization570
Profiling572
Language Definition573
Libraries574
Passing Options to Compile Phases574
Terminal Output and Process Control575
Other Options575
A.3.2 Compiler Pragmas576
Unrolling and Profiling576
Assertions578
Memory Disambiguation578
Cache Control581
A.3.3 Inline Expansion583
Multiflow-style Inlining583
C99-style Inlining584
A.3.4 Machine Model Parameters585
A.3.5 Custom Instructions586
A.4 Visualization Tools588
A.5 The VEX Simulation System589
A.5.1 gprofSupport591
A.5.2 Simulating Custom Instructions594
A.5.3 Simulating the Memory Hierarchy595
A.6 Customizing the VEX Toolchain596
A.6.1 Clusters596
A.6.2 Machine Model Resources597
A.6.3 Memory Hierarchy Parameters599
A.7 Examples of Tool Usage599
A.7.1 Compile and Run599
A.7.2 Profiling602
A.7.3 Custom Architectures603
A.8 Exercises605
APPENDIX B Glossary607
APPENDIX C Bibliography631
Index661