INFORMATION THEORY & BASIC TECHNIQUE

Slides:



Advertisements
Presentasi serupa
Compression & Huffman Code
Advertisements

Modul-8 : Algoritma dan Struktur Data
2. Introduction to Algorithm and Programming
Sistem – Sistem Bilangan, Operasi dan kode
K-Map Using different rules and properties in Boolean algebra can simplify Boolean equations May involve many of rules / properties during simplification.
Perancangan Database Pertemuan 07 s.d 08
BAGIAN III Lapisan Data Link.
BLACK BOX TESTING.
Presented By : Group 2. A solution of an equation in two variables of the form. Ax + By = C and Ax + By + C = 0 A and B are not both zero, is an ordered.
1 Pertemuan 09 Perangkat Keras dalam Komunikasi Data Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
Testing Implementasi Sistem Oleh :Rifiana Arief, SKom, MMSI
Pertemuan 05 Sebaran Peubah Acak Diskrit
Ruang Contoh dan Peluang Pertemuan 05
1 Pertemuan 10 Statistical Reasoning Matakuliah: T0264/Inteligensia Semu Tahun: Juli 2006 Versi: 2/1.
Population and sample. Population is complete actual/theoretical collection of numerical values (scores) that are of interest to the researcher. Simbol.
Sinyal dan Data Pertemuan 06 Matakuliah: H0484/Jaringan Komputer Tahun: 2007.
BAB 6 KOMBINATORIAL DAN PELUANG DISKRIT. KOMBINATORIAL (COMBINATORIC) : ADALAH CABANG MATEMATIKA YANG MEMPELAJARI PENGATURAN OBJEK- OBJEK. ADALAH CABANG.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Lapisan Data Link BAGIAN III.
Pertemuan 07 Peluang Beberapa Sebaran Khusus Peubah Acak Kontinu
Bina Nusantara Mata Kuliah: K0194-Pemodelan Matematika Terapan Tahun : 2008 Aplikasi Model Markov Pertemuan 22:
HAMPIRAN NUMERIK SOLUSI PERSAMAAN NIRLANJAR Pertemuan 3
Dr. Nur Aini Masruroh Deterministic mathematical modeling.
Statistika Mulaab,S,si M.kom Lab CAI Teknik Informatika xxxx Website Kuliah : mulaab.wordpress.com.
Pertemuan 06 Sinyal dan Data
Pertemuan 9 : Pewarnaan graph
13 Akuntansi Biaya Activity Based Costing
9.3 Geometric Sequences and Series. Objective To find specified terms and the common ratio in a geometric sequence. To find the partial sum of a geometric.
Chapter 10 – The Design of Feedback Control Systems PID Compensation Networks.
IP Addressing Laboratorium Teknik Informatika Universitas Gunadarma Stefanus Vlado Adi Kristanto Version 1.4.
OPERATOR DAN FUNGSI MATEMATIK. Operator  Assignment operator Assignment operator (operator pengerjaan) menggunakan simbol titik dua diikuti oleh tanda.
Agung Toto Wibowo Universitas Telkom
ADC / PCM Modul #10 TT3213 SISTEM KOMUNIKASI 1
Jaringan Nirkabel Bab #5 – Enkoding Sinyal.
Spread Spectrum Spread spectrum uses wide band, noise like
Jartel, Sukiswo Sukiswo
Kompresi Gambar Klasifikasi Kompresi Teknik Kompresi 1.
KOMUNIKASI DATA Materi Pertemuan 8.
PROBABILITY DISTRIBUTION
STATISTIKA CHATPER 4 (Perhitungan Dispersi (Sebaran))
KOMUNIKASI DATA Materi Pertemuan 9.
KOMUNIKASI DATA Materi Pertemuan 3.
Induksi Matematika.
KOMUNIKASI DATA Materi Pertemuan 4.
DAFTAR TOPIK SKRIPSI Cecilia E. Nugraheni
KOMUNIKASI DATA Materi Pertemuan 2.
ADC / PCM (ANALOG TO DIGITAL CONVERTER / PULSE CODE MODULATION)
Statistika Chapter 4 Probability.
AKT211 – CAO 08 – Computer Memory (2)
Kode Hamming.
KOMUNIKASI DATA S. Indriani L, M.T
Pengujian Hipotesis (I) Pertemuan 11
Matakuliah : I0014 / Biostatistika Tahun : 2005 Versi : V1 / R1
Dasar-Dasar Pemrograman
Pendugaan Parameter (I) Pertemuan 9
Pertemuan 09 Perangkat Keras dalam Komunikasi Data
BILANGAN REAL BILANGAN BERPANGKAT.
Pendugaan Parameter (II) Pertemuan 10
REAL NUMBERS EKSPONENT NUMBERS.
Teknik Modulasi Pertemuan 07
Master data Management
Pertemuan 4 CLASS DIAGRAM.
Pertemuan 2 Representasi Digital Sinyal Multimedia
Don’t Forget to Avail the Timely Offers with Uber
Lesson 2-1 Conditional Statements 1 Lesson 2-1 Conditional Statements.
Right, indonesia is a wonderful country who rich in power energy not only in term of number but also diversity. Energy needs in indonesia are increasingly.
Probability IIntroduction to Probability ASatisfactory outcomes vs. total outcomes BBasic Properties CTerminology IICombinatory Probability AThe Addition.
Wednesday/ September,  There are lots of problems with trade ◦ There may be some ways that some governments can make things better by intervening.
Transcript presentasi:

INFORMATION THEORY & BASIC TECHNIQUE BAB 2 INFORMATION THEORY & BASIC TECHNIQUE Universitas Telkom INFORMATION THEORY

BAB 2A INFORMATION THEORY Universitas Telkom INFORMATION THEORY

Tujuan Mengenal istilah dan definisi Nilai informasi dan Entropi Memahami Teorema Source Coding Memahami Teorema Channel Coding Mengenal Shannon Limit

NILAI INFORMASI Pada suatu eksperimen probabilistik dengan suatu random variable diskrit S. S = { s1, s2, …., sN} Jumlah informasi yang diproduksi oleh kejadian sk adalah : Jika pk = 1 (kejadian pasti terjadi)  maka I(sk) = 0  Apabila suatu kejadian sudah pasti akan terjadi, maka nilai informasinya = 0 Sifat nilai informasi : I(sk)  0 0  pk  1 I(sk)  I(si) pk < pi Suatu kejadian yang mempunyai nilai kemungkin terjadi lebih kecil maka nilai informasinya akan lebih besar apabila kejadian tersebut terjadi.

NILAI INFORMASI Jika sk dan si independent, maka : I(sksi) =I(sk) + I(si) Bilangan dasar yang digunakan untuk menghitung nilai informasi pada persamaan di atas bisa bermacam-macam. Untuk sistem digital yang menggunakan bilangan biner, maka dipakai bilangan dasar 2.

H = - (p log2 (p) + (1 – p) log2 (1 – p) ENTROPI Entropi H adalah nilai informasi rata-rata per simbol untuk keluaran dari suatu sumber informasi tertentu. H = E [I(sk)] bit/ simbol Untuk sinyal biner (N=2)dengan probabilitas kemunculan p dan (1 – p), maka: H = - (p log2 (p) + (1 – p) log2 (1 – p)

ENTROPI Beberapa catatan untuk entropi : bit pada satuan informasi untuk biner (Bilangan dasar 2) tidak sama dengan binary digit Entropi berkonotasi pada ketidakpastian. Nilai entropi akan maksimum apabila kejadian yang keluar semakin tidak pasti. Contoh : untuk pelemparan uang logam dengan probabilitas yang sama (0,5), maka kejadian yang keluar akan susah ditebak (tidak pasti), sehingga nilai entropinya maksimum.

ENTROPI 2 KEJADIAN Untuk sinyal biner dengan probabilitas kemunculan p dan 1 – p, maka :

ENTROPI : CONTOH Contoh : Hitung nilai entropi (nilai informasi rata-rata) dalam bit / karakter untuk abjad latin (26) huruf apabila : a. probabilitas kemunculan tiap huruf sama b. probabilitas kemunculan terdistribusi sebagai berikut : p = 0,10 untuk huruf a, e, o, t p = 0,07 untuk huruf h, i, n, r, s p = 0,02 untuk huruf c, d, f,l, m, p, u, y p = 0,01 untuk huruf lainnya Jawab : H = - (4 x 0,1 log2 0,1 + 5 x 0,07 log2 0,07 + 9 x 0,02 log2 0,02 + 8 x 0,01 log2 0,01 ) = 4,17 bit / karakter bit/ karakater

DISCRETE MEMORYLESS CHANNEL (DMC) Karakteristik DMC : Input diskrit Output diskret Output kanal suatu saat hanya tergantung pada input kanal saat itu juga, tidak tergantung input sebelum maupun sesudahnya (tidak ada memori dalam kanal)

DISCRETE MEMORYLESS CHANNEL (DMC) Pada kanal memoryless elemen-elemennya masing-masing independen. Probabilitas output merupakan perkalian dari probabilitas dari masing-masing elemen-elemennya. Z = z1, z2, …., zm, …. zN  output U =u1, u2, …., um, … uN  input

BINARY SYMMETRIC CHANNEL (BSC) BSC merupakan kasus khusus dari DMC di mana input dan output terdiri dari elemen biner (0 dan 1). Conditional probability : p(0|1) = p(1|0) = p p(1|1) = p(0|0) = 1 - p

MUTUAL INFORMATION Informasi mutual menyatakan jumlah informasi yang dikirim jika xi dikirimkan dan yj diterima. Jumlah rata-rata informasi mutual adalah : I(X;Y) melibatkan adanya pengirim dan penerima. Sedangkan entropi H(X) menyatakan jumlah rata-rata bit per symbol. Jadi tidak menyatakan adanya pengiriman (tidak berhubungan dengan masalah kanal). Persamaan I(X;Y) di atas diubah menjadi :

EQUIVOCATION Dengan memperhatikan bahwa : dan Maka Sehingga Di mana : H(X|Y) disebut equivocation. Secara kualitatif bisa dinyatakan bahwa jumlah informasi rata-rata yang ditransfer (informasi mutual) adalah sama dengan nilai entropi sumber H(x) dikurangi equivocation yang merupakan loss pada kanal yang bernoise.

EQUIVOCATION Persamaan Informasi Mutual tersebut di atas dapat pula dituliskan sebagai :

INFORMASI MUTUAL I(X,Y) PADA KANAL BSC p(y1) = (1 – a)p + a(1 – p) = a + p – 2ap Karena

Limitations in designing a DCS The Nyquist theoretical minimum bandwidth requirement The Shannon-Hartley capacity theorem (and the Shannon limit) Government regulations Technological limitations Other system requirements (e.g satellite orbits)

Shannon limit Channel capacity: The maximum data rate at which the error-free communication over the channel is performed. Channel capacity on AWGN channel (Shannon-Hartley capacity theorem):

Shannon limit Shannon theorem puts a limit on transmission data rate, not on error probability: Theoretically possible to transmit information at any rate , where with an arbitrary small error probability by using a sufficiently complicated coding scheme For an information rate , it is not possible to find a code that can achieve an arbitrary small error probability.

Shannon limit C/W [bits/s/Hz] Unattainable region Practical region SNR [bits/s/Hz]

Shannon limit Shannon limit There exists a limiting value of below which there can be no error-free communication at any information rate. By increasing the bandwidth alone, the capacity can not be increased to any desired value.

Shannon limit W/C [Hz/bits/s] Practical region -1.6 [dB] Unattainable

Bandwidth efficiency plane R/W [bits/s/Hz] R=C R>C Unattainable region M=256 M=64 Bandwidth limited M=16 M=8 M=4 R<C Practical region M=2 M=4 M=2 M=8 M=16 Shannon limit MPSK MQAM MFSK Power limited

Power and bandwidth limited systems Two major communication resources: Transmit power and channel bandwidth In many communication systems, one of these resources is more precious than the other. Hence, systems can be classified as: Power-limited systems: save power at the expense of bandwidth (for example by using coding schemes) Bandwidth-limited systems: save bandwidth at the expense of power (for example by using spectrally efficient modulation schemes)

BAB 2B BASIC TECHNIQUE Universitas Telkom INFORMATION THEORY

Interpixel Redundancy There is often correlation between adjacent pixels, i.e., the value of the neighbors of an observed pixel can often be predicted from the value of the observed pixel. Coding methods: Run-Length coding. Difference coding

Run-Length Coding Run-length coding is a very widely used and simple compression technique which does not assume a memoryless source We replace runs of symbols (possibly of length one) with pairs of (run-length, symbol) Run Length Coding is Lossless Technique For images, the maximum run-length is the size of a row

Run-Length Coding Every code word is made up of a pair (g, l) where g is the gray level, and l is the number of pixels with that gray level (length, or “run”). E.g., 56 56 56 82 82 82 83 80 56 56 56 56 56 80 80 80 creates the run-length code (56, 3)(82, 3)(83, 1)(80, 4)(56, 5). The code is calculated row by row. Very efficient coding for binary data. Important to know position, and the image dimensions must be stored with the coded image. Used in most fax machines.la University) Image Coding an

Run-Length Coding Suppose we have a sequence of values: The sequence uses 17 separate values. We could code this by saying: We have one 1, three 2’s , 2 1’s …….. In run length code this would be 1 1 3 2 2 1 5 3 2 1 4 6 Taking only 12 values No use if we don’t have runs 1 5 6 8 9 five values would be coded. 1 1 1 5 1 6 1 8 1 9 taking ten values.

Run-Length Coding We also have to decide and specify how many spaces we will leave for the data and how much for the run length value. For example, in the above the values and the run lengths are all less than 10, the spaces are inserted to explain the principle. The code 113221532146 could mean 11 3’s, 22 1’s, 53 2’s and 14 6’s if we did not know the allocation of data for the values and the run length. It will be inefficient to allocate this data without consideration of the original data.

Run-Length Coding Runs with different characters Send the actual character with the run-length HHHHHHHUFFFFFFFFFYYYYYYYYYYYDGGGGG code = 7, H, 1, U, 9, F, 11, Y, 1, D, 5, G SAVINGS IN BITS (considering ASCII): ?

Run-Length Coding

Run-Length Coding

Run-Length Coding Facsimile Compression ITU standard (A4 document, 210 by 297 mm) 1728 pixels per line If 1 bit for each pixel, then over 3 million bits for each page A typical page contains many consecutive white or black pixels -> RLE

Run-Length Coding Run lengths may vary from 0 to 1728 -> many Possibilities and inefficiency with a fixed size code Some runs occur more frequently than others, e.g. most typed pages contain 80% white pixels, spacing between letters is fairly consistent => probabilities of certain runs are predictable => Frequency dependent code on run lengths

Run-Length Coding Some Facsimile compression codes (Terminating, less than 64) Pixels in the run Code: White Code: Black 0 00110101 0000110111 1 000111 010 2 0111 11 3 1000 10 10 00111 0000100 20 0001000 00001101000

Run-Length Coding Some Facsimile compression codes (Make up, greater than or equal to 64) Pixels in the run Code: White Code: Black 64 11011 0000001111 128 10010 000011001000 256 512 # 129 white: Savings: No-prefix property, better compression for long-runs

Difference coding f (xi ) = E.g., The code is calculated rob by row. original 56 56 56 82 82 82 83 80 80 80 80 Code f(xi ) 56 0 0 26 0 0 1 −3 0 0 0 The code is calculated rob by row. Both run-length coding, and difference coding are reversible, and can be combined with, e.g., Huffman coding Xi if i = 0, xi − xi-1 if i > 0

Difference coding : Example

Difference coding : Example

SCALAR QUANTIZATION Reduce the number of data Lossy compression If the data in the form of large number, it is converted to smaller number. Not all data is used If the data to be compressed is analog, quantization is used to sample and digitized it into small number. The smaller the number, the better the compression, But also the greater the loss of information

SCALAR QUANTIZATION-EXAMPLE Data 8 bit  delete the LSB bit  data 7 bit Input data [0,255]  just take quantized value 0, s, 2s, …., ks where (k+1)s < 255 s = 3  output data : 0,3,6,9, 12, … , 255 s = 4  output data : 0, 4, 8, 12, …, 252, 255 PCM (Puse Code Modulation) in voice voice 4 kHz (analog) is sampled 8000 sample/s and encode to 8 bit per sample = 64 kbps

STATISTICAL METHODE Using variable size code  shorter code assigned to symbol that appear more often (have a higher probability of occurrence) Example : Morse, Huffman code, etc

FIXED LENGTH CODE Each symbol is represented as fixed length code Example : ASCII code  code length : 7 bit + 1 bit parity = 8 bit Total bit number = number of characters * 8 bit

VARIABLE SIZE CODE Assigning Code that can be decoded unambiguously Assigning code with the minimum average size Example : Entropi 1.57 Bit per simbol = 1.77 If the data have equal probabiliy (0.25)  entropi = bit/ simbol = 2 bit Symbol Probability Code 1 Code 2 A1 0.49 1 A2 0.25 01 A3 010 000 a4 0.01 001

Prefix Code (= Prefix-free Code) A prefix code is a type of code system (typically a variable-length code) distinguished by its possession of the "prefix property"; which states that there is no valid code word in the system that is a prefix (start) of any other valid code word in the set. a receiver can identify each word without requiring a special marker between words.

Prefix Code : Example 1 A code with code words {9, 59, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does not, because "5" is a prefix of both "59" and "55". A prefix code is an example of a uniquely decodable code:

Binary Prefix Code Dapat direpresentasikan dalam pohon biner Ciri khas: setiap simbol akan menjadi leaves nodes, tidak ada yang menjadi internal nodes

Prefix Code : Example 2 Prefix-free Code {01, 10, 11, 000, 001} Jika ni = banyaknya codeword yang memiliki panjang bit i, maka: n2 = 3 (pada level 2, ada 3 codeword) n3 = 2 (pada level 3 ada 2 codeword

Prefix Code : Example 3 Code {0, 01, 011, 0111} Code {0, 01, 11} Bukan prefix-free code, Code {0, 01, 11}

Ketidaksamaan Kraft-McMillan Teorema 1 Jika C adalah sebuah kode yang unique decodable yang terdiri atas N buah codewords, maka panjang keseluruhan codewordsnya akan memenuhi ketidaksamaan: N = banyaknya codewords lj = panjang codeword ke-j (dalam bit) ni = banyaknya codeword yang memiliki panjang bit i , b = base (dalam hal ini = 2) M = panjang maksimum codeword (M buah bit)

Example {01, 10, 11, 000, 001} {0, 01, 011, 0111} {0, 01, 11, 111}

Ketidaksamaan Kraft-McMillan Setiap kode yang tidak memenuhi ketidaksamaan Kraft-McMillan, pasti bukan unique-decodable-code Setiap unique-decodable-code, pasti memenuhi ketidaksamaan Kraft-McMillan Bisa prefix-free code, atau Bukan prefix-free code Setiap kode yang memenuhi ketidak samaan Kraft-McMillan, belum tentu unique-decodable-code

Example {0, 01, 110, 111} Decode : 01111110 {1, 10, 110, 111} Decode: 10110110

Ketidaksamaan Kraft-McMillan Teorema 2 Untuk setiap himpunan codewords yang panjangnya memenuhi ketidaksamaan Kraft-McMillan yaitu akan selalu dapat dibentuk prefix code dengan panjang codewords yang memenuhi Kraft-McMillan tersebut.

Ketidaksamaan Kraft-McMillan Dengan kata lain, setiap prefix-free code pasti memenuhi ketidaksamaan Kraft-McMillan, dan Untuk setiap komposisi panjang codeword pada kode yang memenuhi ketidaksamaan Kraft-McMillan, pasti dapat dibentuk prefix-code

Example {0, 01, 110, 111} Tidak unique decodable, tapi komposisi panjang codeword pada kode tersebut memenuhi ketidaksamaan KM {0, 10, 110, 111} Prefix code dengan komposisi panjang codeword yang sama dengan kode di atas

ASSIGNMENT #1 Make a description paper (and complete it with algorithm) about Tunstall Code and Golomb Code in Bahasa Indonesia.