William Stallings Computer Organization and Architecture 6th Edition

William Stallings Computer Organization and Architecture 6th Edition
Chapter 12 CPU Structure and Function

CPU Structure Untuk mengerti organisasi prosesor kita perhatikan apa yang harus dilakukan oleh CPU : Fetch instructions : Prosesor membaca sebuah instruksi dari memori (register, cache, main memory) Interpret instructions : Instruksi di decode untuk menentukan action yang diperlukan Fetch data : Pengeksekusian sebuah instruksi bisa memerlukan pembacaan data dari memori atau I/O module Process data : Pengeksekusian sebuah instruksi (operasi ALU) Write data : Hasil eksekusi bisa memerlukan penulisan data ke memory atau I/O Module 22

CPU With Systems Bus

CPU Internal Structure

Number and function vary between processor designs
Registers CPU must have some working space (temporary storage) : Called registers Number and function vary between processor designs One of the major design decisions Top level of memory hierarchy Register didalam prosesor dibagi dua peran: User-visible register Control and status register 23

User Visible Registers
Register yang bisa direference melalui bahasa mesin Dapat dikelompokkan menjadi : General Purpose register Data registe Address register Condition Codes register 24

General Purpose Registers (1)
Bisa digunakan sebagai operand untuk setiap instruksi ( true general purpose ) Hanya beberapa yang bisa digunakan sebagai opernad pada instruksi tertentu (restricted ) Pada beberapa kasus bisa digunakan untuk addressing function (register indirect, displacement) Bisa digunakan untuk menyimpan data atau address. Data : Accumulator Addressing : Segment 25

General Purpose Registers (2)
Make them general purpose Increase flexibility and programmer options Increase instruction size & complexity Make them specialized Smaller (faster) instructions Less flexibility 26

How Many General Purpose Registers?
Between Fewer = more memory references More does not reduce memory references 27

Large enough to hold full address Large enough to hold full word
How big? Large enough to hold full address Large enough to hold full word possible to combine two data registers C programming double int a; long int a; 28

Condition Code Registers
Sets of individual bits e.g. result of last operation was zero Can be read (implicitly) by programs e.g. Jump if zero Can not (usually) be set by programs 29

Control & Status Registers
Setiap prosesor memiliki organisasi register dan terminologi yang berbeda-beda. Ada 4 register yang umumnya digunakan untuk eksekusi instruksi: Program Counter (PC) Instruction Decoding Register (IR) Memory Address Register (MAR) Memory Buffer Register (MBR) 30

Program Status Word ( PSW )
A set of bits : contain condition code Sign ( hasil akhir dari operasi ALU) Zero (diset jika hasil akhir operasi ALU = 0) Carry Equal Overflow Interrupt enable/disable Supervisor (supervisor atau user mode) 31

May have registers pointing to:
Other Registers May have registers pointing to: Process control blocks (see O/S) Interrupt Vectors (see O/S) 33

Example Register Organizations

Stallings Chapter 3 (sub bab 3.2)
Instruction Cycle Revisi Stallings Chapter 3 (sub bab 3.2) Fetch Execute Interrupt 35

Indirect Cycle Instruksi bisa memerlukan beberapa kali memory access untuk fetch operands Diperlukan tambahan instruction subcycle

Instruction Cycle with Indirect

Instruction Cycle State Diagram

Data Flow (Instruction Fetch)
Depends on CPU design In general: Fetch PC contains address of next instruction Address moved to MAR Address placed on address bus Control unit requests memory read Result placed on data bus, copied to MBR, then to IR Meanwhile PC incremented by 1

If indirect addressing, indirect cycle is performed
Data Flow (Data Fetch) IR is examined If indirect addressing, indirect cycle is performed Right most N bits of MBR transferred to MAR Control unit requests memory read Result (address of operand) moved to MBR

Data Flow (Fetch Diagram)

Data Flow (Indirect Diagram)

Depends on instruction being executed May include
Data Flow (Execute) May take many forms Depends on instruction being executed May include Memory read/write Input/Output Register transfers ALU operations

Data Flow (Interrupt) Simple Predictable Current PC saved to allow resumption after interrupt Contents of PC copied to MBR Special memory location (e.g. stack pointer) loaded to MAR MBR written to memory PC loaded with address of interrupt handling routine Next instruction (first of interrupt handler) can be fetched

Data Flow (Interrupt Diagram)

Fetch accessing main memory
Prefetch Fetch accessing main memory Execution usually does not access main memory Can fetch next instruction during execution of current instruction Called instruction prefetch 36

Add more stages to improve performance
Improved Performance But not doubled: Fetch usually shorter than execution Prefetch more than one instruction? Any jump or branch means that prefetched instructions are not the required instructions Add more stages to improve performance 37

Meningkatkan kinerja komputer dapat dicapai dengan:
Pipelining Meningkatkan kinerja komputer dapat dicapai dengan: Memperbaiki teknologi rangkain electronic agar lebih cepat Memperbaiki oragnaisasi nya : spt menambah jumlah register, cache memory dll. Dari segi organisasi komputer juga disebut dengan Instruksi pipeline Instruksi pipeline serupa dengan assembly line di suatu insdustri 38

Pipelining Instruksi yang paling sederhana dibagi menjadi 2 tingkat yaitu: Fetch Instruction Execute instruction Tingkat pertama mengambil instruksi dan menyimpan di buffer Jika tingkat kedua sdh free, tingkat satu memberikan instruksi ke tingkat kedua untuk dieksekusi, dan selama tingkat kedua mengeksekusi instruksi tingkat satu mengambil untuk instruksi berikutnya Disebut dengan Instruction prefetch atau fetch overlap

Pipelining Perhatikan gambar di slide berikut
Waktu eksekusi umumnya lebih lama dari waktu Fetch, sehingga tingkat pertama (Fecth) menunggu untuk beberapa waktu. Conditional branch instruction akan membuat instruksi berikutnya yang akan diambil tidak jelas (belum tahu), Jadi tingkat Fetch harus menunggu sampai tingkat kedua selesai mengeksekusi instruksi. Untuk mengurangi waktu yang hilang solusi sbb: Jika ada instruski conditional branch yang diberikan ke tingkat kedua, maka tingkat pertama langsung mengambil instruksi berikutnya Jika tdk ada branch maka tdk ada waktu yang hilang Jika ada branch maka instruksi yg sdh diambil dibuang (discarded) dan kemudian mengambil instruksi yang baru

Two Stage Instruction Pipeline

Dengan pipeline dua tingkat diatas sudah dapat meningkatkan speed up
Pipelining Dengan pipeline dua tingkat diatas sudah dapat meningkatkan speed up Untuk menambah speedup lagi berarti instruksi di pecah-pecah lagi menjadi sbb: Fetch instruction Decode instruction Calculate operands Fetch operands Execute instructions Write result Overlap these operations

Timing of Pipeline 39

Dari gambar diatas terlihat bahwa :
Dengan 6 tingkat pipeline dapat mengurangi waktu eksekusi untuk 9 instruksi dari 54 time units menjadi 14 time units Jika ke-6 tingkat memerlukan durasi yang berbeda, maka yang lebih cepat harus menunggu. Masalah lain tentang conditional branch instruction. Perhatikan gbr dibawah, jika instruksi-3 branch ke instruksi-15.

Branch in a Pipeline 40

Six Stage Instruction Pipeline

Alternative Pipeline

Speedup Factors with Instruction Pipelining

Berbagai pendekatan untuk menangani Branch
Dealing with Branches Berbagai pendekatan untuk menangani Branch Multiple Streams Prefetch Branch Target Loop buffer Branch prediction Delayed branching 41

Prefetch each branch into a separate pipeline
Multiple Streams Kedua instruksi percabangan diambil dengan dua buah stream (Have two pipelines) Prefetch each branch into a separate pipeline Menggunakan pipeline yang sesuai Masalah dengan pendekatan ini : Akan terjadi rebutan dan delay untuk akses register dan memori Instruksi branch tambahan bisa masuk lagi sebelum instruksi branch utama diselesaikan, sehingga tdk mampu ditangani oleh sistem ini. Contoh mesin yg menggunakan pendekatan ini IBM 370/168 dan IBM 3033 42

Prefetch Branch Target
Dilakukan pengambilan awal (prefetch) terhadap instruksi setelah percabangan dan target percabangan Digunakan oleh: IBM 360/91 Masalah: diperlukan buffer dan register untuk preftech 43

Loop Buffer Menggunakan Very fast memory
Very good for small loops or jumps Used by CRAY-1 44

Branch Prediction

Branch Prediction (1) Berbagai teknik digunakan untuk memprediksi apakah suatu percabangan akan diambil al: Predict never taken Assume that jump will not happen Always fetch next instruction 68020 & VAX 11/780 VAX will not prefetch after branch Predict always taken Assume that jump will happen Always fetch target instruction Berdasarkan studi lebih dari 50% branch dilaksanakan As soon as the branch is decoded and the target address is computed, we assume the branch to be taken and begin fetching and executing at the target address. 45

Branch Prediction (2) Predict by Opcode Taken/Not taken switch
Prosesor berasumsi jump akan dilaksanakan pada instruksi dengan opcode terterntu saja. Can get up to 75% success Taken/Not taken switch Based on previous history Good for loops 46

Branch Prediction (3) Branch History Table
Menyimpan history instruksi-instruksi branch yang baru saja dieksekusi 47

Branch Prediction Flowchart

Branch Prediction State Diagram

Dealing With Branches

Intel 80486 Pipelining Fetch Decode stage 1 Decode stage 2 Execute
From cache or external memory Put in one of two 16-byte prefetch buffers Fill buffer with new data as soon as old data consumed Average 5 instructions fetched per load Independent of other stages to keep buffers full Decode stage 1 Opcode & address-mode info At most first 3 bytes of instruction Can direct D2 stage to get rest of instruction Decode stage 2 Expand opcode into control signals Computation of complex address modes Execute ALU operations, cache access, register update Writeback Update registers & flags Results sent to cache & bus interface write buffers

80486 Instruction Pipeline Examples

Pentium 4 Registers

EFLAGS Register

Control Registers

MMX uses several 64 bit data types Use 3 bit register address fields
MMX Register Mapping MMX uses several 64 bit data types Use 3 bit register address fields 8 registers No MMX specific registers Aliasing to lower 64 bits of existing floating point registers

Pentium Interrupt Processing
Interrupts Maskable Nonmaskable Exceptions Processor detected Programmed Interrupt vector table Each interrupt type assigned a number Index to vector table 256 * 32 bit interrupt vectors 5 priority classes

PowerPC User Visible Registers

PowerPC Register Formats

MMX Register Mapping Diagram

SOAL Diketahui prosesor tanpa pipeline dengan 6 tahapan eksekusi instruksi masing-masing memerlukan waktu sbb: 50 ns, 50 ns, 60 ns, 60 ns, 50 ns, and 50 ns. Hitung : (a). Instruction latency (b). Total waktu untuk mengeksekusi 100 instruksi Jawab: (a) = 320 ns (b). 100*320 = ns

SOAL Menyambung soal diatas jika prosesor menggunakan pipeline dimana memerlukan waktu tambahan (overhead) untuk pindah dari satu stage ke stage yang lainnya. Hitung waktu yang diperlukan untuk menjalankan 100 instruksi Jawab: The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead = = 65 ns Instruction latency = 65 ns Time to execute 100 instructions = 65*6 + 65*99 = = 6825 ns

Dari kedua soal diatas hitung speed up nya Jawab:
Average instruction time not pipelined = 320 ns Average instruction time pipelined = 65 ns Speedup = 320 / 65 = 4.92

William Stallings Computer Organization and Architecture 6th Edition

Presentasi serupa

Presentasi berjudul: "William Stallings Computer Organization and Architecture 6th Edition"— Transcript presentasi:

Presentasi serupa

Tentang proyek

Tanggapan

Masuk

Otorisasi melalui jaringan sosial:

William Stallings Computer Organization and Architecture 6th Edition

Presentasi serupa

Presentasi berjudul: "William Stallings Computer Organization and Architecture 6th Edition"— Transcript presentasi:

Presentasi serupa

Tentang proyek

Tanggapan