图书介绍

嵌入式计算：体系结构、编译器和工具的VLIN方法PDF|Epub|txt|kindle电子书版本网盘下载

（美）费希尔（Fisher，J.A.）等著著
出版社：北京：机械工业出版社
ISBN：7111197712
出版时间：2006
标注页数：671页
文件大小：377MB
文件页数：708页
主题词：微型计算机－系统设计－英文

PDF下载

点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示：（请使用BT下载软件FDM进行下载）软件下载地址页直链下载[便捷但速度慢] [在线试读本书] [在线获取解压码]

点击复制MD5值：09b6bfa54ef3908e41ed32eb92cbc261

下载说明

嵌入式计算：体系结构、编译器和工具的VLIN方法PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

点击复制85GB完整离线版磁力链接到迅雷FDM等BT下载工具进行下载详情点击-查看共享计划

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台）。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用！后期资源热门了。安装了迅雷也可以迅雷进行下载！

（文件页数要大于标注页数，上中下等多册电子书除外）

注意：本站所有压缩包均有解压码： 点击下载压缩包解压工具

图书目录

CHAPTER 1 An Introduction to Embedded Processing1

1.1 What Is Embedded Computing?3

1.1.1 Attributes of Embedded Devices4

1.1.2 Embedded Is Growing5

1.2 Distinguishing Between Embedded and General-Purpose Computing6

1.2.1 The "Run One Program Only" Phenomenon8

1.2.2 Backward and Binary Compatibility9

1.2.3 Physical Limits in the Embedded Domain10

1.3 Characterizing Embedded Computing11

1.3.1 Categorization by Type of Processing Engine12

Digital Signal Processors13

Network Processors16

1.3.2 Categorization by Application Area17

The Image Processing and Consumer Market18

The Communications Market20

The Automotive Market22

1.3.3 Categorization by Workload Differences22

1.4 Embedded Market Structure23

1.4.1 The Market for Embedded Processor Cores24

1.4.2 Business Model of Embedded Processors25

1.4.3 Costs and Product Volume26

1.4.4 Software and the Embedded Software Market28

1.4.5 Industry Standards28

1.4.6 Product Life Cycle30

1.4.7 The Transition to SoC Design31

Effects of SoC on the Business Model34

Centers of Embedded Design35

1.4.8 The Future of Embedded Systems36

Connectivity:Always-on Infrastructure36

State:Personal Storage36

Administration37

Security37

The Next Generation37

1.5 Further Reading38

1.6 Exercises40

CHAPTER 2 An Overview of VLIW and ILP45

2.1 Semantics and Parallelism46

2.1.1 Baseline:Sequential Program Semantics46

2.1.2 Pipelined Execution,Overlapped Execution,and Multiple Execution Units47

2.1.3 Dependence and Program Rearrangement51

2.1.4 ILP and Other Forms of Parallelism52

2.2 Design Philosophies54

2.2.1 An Illustration of Design Philosophies:RISC Versus CISC56

2.2.2 First Definition of VLIW57

2.2.3 A Design Philosophy:VLIW59

VLIW Versus Superscalar59

VLIW Versus DSP62

2.3 Role of the Compiler63

2.3.1 The Phases of a High-Performance Compiler63

2.3.2 Compiling for ILP and VLIW65

2.4 VLIW in the Embedded and DSP Domains69

2.5 Historical Perspective and Further Reading71

2.5.1 ILP Hardware in the 1960s and 1970s71

Early Supercomputer Arithmetic Units71

Attached Signal Processors72

Horizontal Microcode72

2.5.2 The Development of ILP Code Generation in the 1980s73

Acyclic Microcode Compaction Techniques73

Cyclic Techniques:Software Pipelining75

2.5.3 VLIW Development in the 1980s76

2.5.4 ILP in the 1990s and 2000s77

2.6 Exercises78

CHAPTER 3 An Overview of ISA Design83

3.1 Overview:What to Hide84

3.1.1 Architectural State:Memory and Registers84

3.1.2 Pipelining and Operational Latency85

3.1.3 Multiple Issue and Hazards86

Exposing Dependence and Independence86

Structural Hazards87

Resource Hazards89

3.1.4 Exception and Interrupt Handling89

3.1.5 Discussion90

3.2 Basic VLIW Design Principles91

3.2.1 Implications for Compilers and Implementations92

3.2.2 Execution Model Subtleties93

3.3 Designing a VLIW ISA for Embedded Systems95

3.3.1 Application Domain96

3.3.2 ILP Style98

3.3.3 Hardware/Software Tradeoffs100

3.4 Instruction-set Encoding101

3.4.1 A Larger Definition of Architecture101

3.4.2 Encoding and Architectural Style105

RISC Encodings107

CISC Encodings108

VLIW Encodings109

Why Not Superscalar Encodings?109

DSP Encodings110

Vector Encodings111

3.5 VLIW Encoding112

3.5.1 Operation Encoding113

3.5.2 Instruction Encoding113

Fixed-overhead Encoding115

Distributed Encoding115

Template-based Encoding116

3.5.3 Dispatching and Opcode Subspaces117

3.6 Encoding and Instruction-set Extensions119

3.7 Further Reading121

3.8 Exercises121

CHAPTER 4 Architectural Structures in ISA Design125

4.1 The Datapath127

4.1.1 Location of Operands and Results127

4.1.2 Datapath Width127

4.1.3 Operation Repertoire129

Simple Integer and Compare Operations131

Carry,Overflow,and Other Flags131

Common Bitwise Utilities132

Integer Multiplication132

Fixed-point Multiplication133

Integer Division135

Floating-point Operations136

Saturated Arithmetic137

4.1.4 Micro-SIMD Operations139

Alignment Issues141

Precision Issues141

Dealing with Control Flow142

Pack,Unpack,and Mix143

Reductions143

4.1.5 Constants144

4.2 Registers and Clusters144

4.2.1 Clustering145

Architecturally Invisible Clustering147

Architecturally Visible Clustering147

4.2.2 Heterogeneous Register Files149

4.2.3 Address and Data Registers149

4.2.4 Special Register File Features150

Indexed Register Files150

Rotating Register Files151

4.3 Memory Architecture151

4.3.1 Addressing Modes152

4.3.2 Access Sizes153

4.3.3 Alignment Issues153

4.3.4 Caches and Local Memories154

Prefetching154

Local Memories and Lockable Caches156

4.3.5 Exotic Addressing Modes for Embedded Processing156

4.4 Branch Architecture156

4.4.1 Unbundling Branches158

Two-step Branching159

Three-step Branching159

4.4.2 Multiway Branches160

4.4.3 Multicluster Branches161

4.4.4 Branches and Loops162

4.5 Speculation and Predication163

4.5.1 Speculation163

Control Speculation164

Data Speculation167

4.5.2 Predication168

Full Predication169

Partial Predication170

Cost and Benefits of Predication171

Predication in the Embedded Domain172

4.6 System Operations173

4.7 Further Reading174

4.8 Exercises175

CHAPTER 5 Microarchitecture Design179

5.1 Register File Design182

5.1.1 Register File Structure182

5.1.2 Register Files,Technology,and Clustering183

5.1.3 Separate Address and Data Register Files184

5.1.4 Special Registers and Register File Features186

5.2 Pipeline Design186

5.2.1 Balancing a Pipeline187

5.3 VLIW Fetch,Sequencing,and Decoding191

5.3.1 Instruction Fetch191

5.3.2 Alignment and Instruction Length192

5.3.3 Decoding and Dispersal194

5.3.4 Decoding and ISA Extensions195

5.4 The Datapath195

5.4.1 Execution Units197

5.4.2 Bypassing and Forwarding Logic200

5.4.3 Exposing Latencies202

5.4.4 Predication and Selects204

5.5 Memory Architecture206

5.5.1 Local Memory and Caches206

5.5.2 Byte Manipulation209

5.5.3 Addressing,Protection,and Virtual Memory210

5.5.4 Memories in Multiprocessor Systems211

5.5.5 Memory Speculation213

5.6 The Control Unit214

5.6.1 Branch Architecture214

5.6.2 Predication and Selects215

5.6.3 Interrupts and Exceptions216

5.6.4 Exceptions and Pipelining218

Drain and Flush Pipeline Models218

Early Commit219

Delayed Commit220

5.7 Control Registers221

5.8 Power Considerations221

5.8.1 Energy Efficiency and ILP222

System-level Power Considerations224

5.9 Further Reading225

5.10 Exercises227

CHAPTER 6 System Design and Simulation231

6.1 System-on-a-Chip(SoC)231

6.1.1 IP Blocks and Design Reuse232

A Concrete SoC Example233

Virtual Components and the VSIA Alliance235

6.1.2 Design Flows236

Creation Flow236

Verification Flow238

6.1.3 SoC Buses239

Data Widths240

Masters,Slaves,and Arbiters241

Bus Transactions242

Test Modes244

6.2 Processor Cores and SoC245

6.2.1 Nonprogrammable Accelerators246

Reconfigurable Logic248

6.2.2 Multiprocessing on a Chip250

Symmetric Multiprocessing250

Heterogeneous Multiprocessing251

Example:A Multicore Platform for Mobile Multimedia252

6.3 Overview of Simulation254

6.3.1 Using Simulators256

6.4 Simulating a VLIW Architecture257

6.4.1 Interpretation258

6.4.2 Compiled Simulation259

Memory262

Registers263

Control Flow263

Exceptions266

Analysis of Compiled Simulation267

Performance Measurement and Compiled Simulation268

6.4.3 Dynamic Binary Translation268

6.4.4 Trace-driven Simulation270

6.5 System Simulation271

6.5.1 I/O and Concurrent Activities272

6.5.2 Hardware Simulation272

Discrete Event Simulation274

6.5.3 Accelerating Simulation275

In-Circuit Emulation275

Hardware Accelerators for Simulation276

6.6 Validation and Verification276

6.6.1 Co-simulation278

6.6.2 Simulation,Verification,and Test279

Formal Verification280

Design for Testability280

Debugging Support for SoC281

6.7 Further Reading282

6.8 Exercises284

CHAPTER 7 Embedded Compiling and Toolchains287

7.1 What Is Important in an ILP Compiler?287

7.2 Embedded Cross-Developmant Toolchains290

7.2.1 Compiler291

7.2.2 Assembler292

7.2.3 Libraries294

7.2.4 Linker296

7.2.5 Post-link Optimizer297

7.2.6 Run-time Program Loader297

7.2.7 Simulator299

7.2.8 Debuggers and Monitor ROMs300

7.2.9 Automated Test Systems301

7.2.10 Profiling Tools302

7.2.11 Binary Utilities302

7.3 Structure of an ILP Compiler302

7.3.1 Front End304

7.3.2 Machine-independent Optimizer304

7.3.3 Back End:Machine-specific Optimizations306

7.4 Code Layout306

7.4.1 Code Layout Techniques306

DAG-based Placement308

The "Pettis-Hansen" Technique310

Procedure Inlining310

Cache Line Coloring311

Temporal-order Placement311

7.5 Embedded-Specific Tradeoffs for Compilers311

7.5.1 Space,Time,and Energy Tradeoffs312

7.5.2 Power-specific Optimizations315

Fundamentals of Power Dissipation316

Power-aware Software Techniques317

7.6 DSP-Specific Compiler Optimizations320

7.6.1 Compiler-visible Features of DSPs322

Heterogeneous Registers322

Addressing Modes322

Limited Connectivity323

Local Memories323

Harvard Architecture324

7.6.2 Instruction Selection and Scheduling325

7.6.3 Address Computation and Offset Assignment327

7.6.4 Local Memories327

7.6.5 Register Assignment Techniques328

7.6.6 Retargetable DSP and ASIP Compilers329

7.7 Further Reading332

7.8 Exercises333

CHAPTER 8 Compiling for VLIWs and ILP337

8.1 Profiling338

8.1.1 Types of Profiles338

8.1.2 Profile Collection341

8.1.3 Synthetic Profiles(Heuristics in Lieu of Profiles)341

8.1.4 Profile Bookkeeping and Methodology342

8.1.5 Profiles and Embedded Applications342

8.2 Scheduling343

8.2.1 Acyclic Region Types and Shapes345

Basic Blocks345

Traces345

Superblocks345

Hyperblocks347

Treegions347

Percolation Scheduling348

8.2.2 Region Formation350

Region Selection351

Enlargement Techniques353

Phase-ordering Considerations356

8.2.3 Schedule Construction357

Analyzing Programs for Schedule Construction359

Compaction Techniques362

Compensation Code365

Another View of Scheduling Problems367

8.2.4 Resource Management During Scheduling368

Resource Vectors368

Finite-state Automata369

8.2.5 Loop Scheduling371

Modulo Scheduling373

8.2.6 Clustering380

8.3 Register Allocation382

8.3.1 Phase-ordering Issues383

8.4 Speculation and Predication385

8.4.1 Control and Data Speculation385

8.4.2 Predicated Execution386

8.4.3 Prefetching389

8.4.4 Data Layout Methods390

8.4.5 Static and Hybrid Branch Prediction390

8.5 Instruction Selection390

8.6 Further Reading391

8.7 Exercises395

CHAPTER 9 The Run-time System399

9.1 Exceptions,Interrupts,and Traps400

9.1.1 Exception Handling400

9.2 Application Binary Interface Considerations402

9.2.1 Loading Programs404

9.2.2 Data Layout406

9.2.3 Accessing Global Data407

9.2.4 Calling Conventions409

Registers409

Call Instructions409

Call Sites410

Function Prologues and Epilogues412

9.2.5 Advanced ABI Topics412

Variable-length Argument Lists412

Dynamic Stack Allocation413

Garbage Collection414

Linguistic Exceptions414

9.3 Code Compression415

9.3.1 Motivations416

9.3.2 Compression and Information Theory417

9.3.3 Architectural Compression Options417

Decompression on Fetch420

Decompression on Refill420

Load-time Decompression420

9.3.4 Compression Methods420

Hand-tuned ISAs421

Ad Hoc Compression Schemes421

RAM Decompression422

Dictionary-based Software Compression422

Cache-based Compression422

Quantifying Compression Benefits424

9.4 Embedded Operating Systems427

9.4.1 "Traditional" OS Issues Revisited427

9.4.2 Real-time Systems428

Real-time Scheduling429

9.4.3 Multiple Flows of Control431

Threads,Processes,and Microkernels432

9.4.4 Market Considerations433

Embedded Linux435

9.4.5 Downloadable Code and Virtual Machines436

9.5 Multiprocessing and Multithreading438

9.5.1 Multiprocessing in the Embedded World438

9.5.2 Multiprocessing and VLIW439

9.6 Further Reading440

9.7 Exercises441

CHAPTER 10 Application Design and Customization443

10.1 Programming Language Choices443

10.1.1 Overview of Embedded Programming Languages444

10.1.2 Traditional C and ANSI C445

10.1.3 C++ and Embedded C++447

Embedded C++449

10.1.4 Matlab450

10.1.5 Embedded Java452

The Allure of Embedded Java452

Embedded Java:The Dark Side455

10.1.6 C Extensions for Digital Signal Processing456

Restricted Pointers456

Fixed-point Data Types459

Circular Arrays461

Matrix Referencing and Operators462

10.1.7 Pragmas,Intrinsics,and Inline Assembly Language Code462

Compiler Pragmas and Type Annotations462

Assembler Inserts and Intrinsics463

10.2 Performance,Benchmarking,and Tuning465

10.2.1 Importance and Methodology465

10.2.2 Tuning an Application for Performance466

Profiling466

Performance Tuning and Compilers467

Developing for ILP Targets468

10.2.3 Benchmarking473

10.3 Scalability and Customizability475

10.3.1 Scalability and Architecture Families476

10.3.2 Exploration and Scalability477

10.3.3 Customization478

Customized Implementations479

10.3.4 Reconfigurable Hardware480

Using Programmable Logic480

10.3.5 Customizable Processors and Tools481

Describing Processors481

10.3.6 Tools for Customization483

Customizable Compilers485

10.3.7 Architecture Exploration487

Dealing with the Complexity488

Other Barriers to Customization488

Wrapping Up489

10.4 Further Reading489

10.5 Exercises490

CHAPTER 11 Application Areas493

11.1 Digital Printing and Imaging493

11.1.1 Photo Printing Pipeline495

JPEG Decompression495

Scaling496

Color Space Conversion497

Dithering499

11.1.2 Implementation and Performance501

Summary505

11.2 Telecom Applications505

11.2.1 Voice Coding506

Waveform Codecs506

Vocoders507

Hybrid Coders508

11.2.2 Multiplexing509

11.2.3 The GSM Enhanced Full-rate Codec510

Implementation and Performance510

11.3 Other Application Areas514

11.3.1 Digital Video515

MPEG-1and MPEG-2516

MPEG-4518

11.3.2 Automotive518

Fail-safety and Fault Tolerance519

Engine Control Units520

In-vehicle Networking520

11.3.3 Hard Disk Drives522

Motor Control524

Data Decoding525

Disk Scheduling and On-disk Management Tasks526

Disk Scheduling and Off-disk Management Tasks527

11.3.4 Networking and Network Processors528

Network Processors531

11.4 Further Reading535

11.5 Exercises537

APPENDIX A The VEX System539

A.1 The VEX Instruction-set Architecture540

A.1.1 VEX Assembly Language Notation541

A.1.2 Clusters542

A.1.3 Execution Model544

A.1.4 Architecture State545

A.1.5 Arithmetic and Logic Operations545

Examples547

A.1.6 Intercluster Communication549

A.1.7 Memory Operations550

A.1.8 Control Operations552

Examples553

A.1.9 Structure of the Default VEX Cluster554

A.1.10 VEX Semantics556

A.2 The VEX Run-time Architecture558

A.2.1 Data Allocation and Layout559

A.2.2 Register Usage560

A.2.3 Stack Layout and Procedure Linkage560

Procedure Linkage563

A.3 The VEX C Compiler566

A.3.1 Command Line Options568

Output Files569

Preprocessing570

Optimization570

Profiling572

Language Definition573

Libraries574

Passing Options to Compile Phases574

Terminal Output and Process Control575

Other Options575

A.3.2 Compiler Pragmas576

Unrolling and Profiling576

Assertions578

Memory Disambiguation578

Cache Control581

A.3.3 Inline Expansion583

Multiflow-style Inlining583

C99-style Inlining584

A.3.4 Machine Model Parameters585

A.3.5 Custom Instructions586

A.4 Visualization Tools588

A.5 The VEX Simulation System589

A.5.1 gprofSupport591

A.5.2 Simulating Custom Instructions594

A.5.3 Simulating the Memory Hierarchy595

A.6 Customizing the VEX Toolchain596

A.6.1 Clusters596

A.6.2 Machine Model Resources597

A.6.3 Memory Hierarchy Parameters599

A.7 Examples of Tool Usage599

A.7.1 Compile and Run599

A.7.2 Profiling602

A.7.3 Custom Architectures603

A.8 Exercises605

APPENDIX B Glossary607

APPENDIX C Bibliography631

Index661