Agung Toto Wibowo Universitas Telkom

Slides:



Advertisements
Presentasi serupa
Design and Analysis of Algorithm Recursive Algorithm Analysis
Advertisements

Compression & Huffman Code
Modul-8 : Algoritma dan Struktur Data
2. Introduction to Algorithm and Programming
Sistem – Sistem Bilangan, Operasi dan kode
Lecture 5 Nonblocking I/O and Multiplexing Erick Pranata © Sekolah Tinggi Teknik Surabaya 1.
Dasar Pemrograman Java Pertemuan 2 Pemrograman Berbasis Obyek Oleh Tita Karlita.
Korelasi Linier KUSWANTO Korelasi Keeratan hubungan antara 2 variabel yang saling bebas Walaupun dilambangkan dengan X dan Y namun keduanya diasumsikan.
K-Map Using different rules and properties in Boolean algebra can simplify Boolean equations May involve many of rules / properties during simplification.
TEKNIK PENGINTEGRALAN
1 DATA STRUCTURE “ STACK” SHINTA P STMIK MDP APRIL 2011.
BAGIAN III Lapisan Data Link.
BLACK BOX TESTING.
1 Pertemuan 09 Perangkat Keras dalam Komunikasi Data Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
1 Diselesaikan Oleh KOMPUTER Langkah-langkah harus tersusun secara LOGIS dan Efisien agar dapat menyelesaikan tugas dengan benar dan efisien. ALGORITMA.
Dasar Pemrograman Java Pertemuan 2 Pemrograman Berbasis Obyek.
Population and sample. Population is complete actual/theoretical collection of numerical values (scores) that are of interest to the researcher. Simbol.
Masalah Transportasi II (Transportation Problem II)
Sinyal dan Data Pertemuan 06 Matakuliah: H0484/Jaringan Komputer Tahun: 2007.
BAB 6 KOMBINATORIAL DAN PELUANG DISKRIT. KOMBINATORIAL (COMBINATORIC) : ADALAH CABANG MATEMATIKA YANG MEMPELAJARI PENGATURAN OBJEK- OBJEK. ADALAH CABANG.
Pertemuan 07 Peluang Beberapa Sebaran Khusus Peubah Acak Kontinu
HAMPIRAN NUMERIK SOLUSI PERSAMAAN NIRLANJAR Pertemuan 3
Verb Tense Tense denotes the time of the action indicated by a verb. The time is not always the same as that indicated by the name of the tense.
Pertemuan 06 Sinyal dan Data
1 Pertemuan 15 Game Playing Matakuliah: T0264/Intelijensia Semu Tahun: Juli 2006 Versi: 2/1.
1 HAMPIRAN NUMERIK SOLUSI PERSAMAAN LANJAR Pertemuan 5 Matakuliah: K0342 / Metode Numerik I Tahun: 2006 TIK:Mahasiswa dapat meghitung nilai hampiran numerik.
1 Pertemuan 13 Algoritma Pergantian Page Matakuliah: T0316/sistem Operasi Tahun: 2005 Versi/Revisi: 5.
9.3 Geometric Sequences and Series. Objective To find specified terms and the common ratio in a geometric sequence. To find the partial sum of a geometric.
Chapter 10 – The Design of Feedback Control Systems PID Compensation Networks.
INFORMATION THEORY & BASIC TECHNIQUE
OPERATOR DAN FUNGSI MATEMATIK. Operator  Assignment operator Assignment operator (operator pengerjaan) menggunakan simbol titik dua diikuti oleh tanda.
Jaringan Nirkabel Bab #5 – Enkoding Sinyal.
Jartel, Sukiswo Sukiswo
Kompresi Gambar Klasifikasi Kompresi Teknik Kompresi 1.
KOMUNIKASI DATA Materi Pertemuan 8.
Teknik. Pemrog. Terstruktur 2
Memahami Terminology Instrumentasi pada pengolahan migas
Konsep Pemrograman 3
STATISTIKA CHATPER 4 (Perhitungan Dispersi (Sebaran))
KOMUNIKASI DATA Materi Pertemuan 9.
KOMUNIKASI DATA Materi Pertemuan 3.
07/11/2017 BARISAN DAN DERET KONSEP BARISAN DAN DERET 1.
Rekayasa Perangkat Lunak Class Diagram
Program Studi S-1 Teknik Informatika FMIPA Universitas Padjadjaran
Terminology The terminology between original image and image compression Compression Ratio Bit per pixel.
Konsep pemrograman LOOP
AKT211 – CAO 08 – Computer Memory (2)
Kode Hamming.
KOMUNIKASI DATA S. Indriani L, M.T
Pengujian Hipotesis (I) Pertemuan 11
Matakuliah : I0014 / Biostatistika Tahun : 2005 Versi : V1 / R1
CLASS DIAGRAM.
Pendugaan Parameter (I) Pertemuan 9
Algorithms and Programming Searching
Pendugaan Parameter (II) Pertemuan 10
REAL NUMBERS EKSPONENT NUMBERS.
Teknik Pengujian Software
Teknik Modulasi Pertemuan 07
Master data Management
Pertemuan 4 CLASS DIAGRAM.
Pertemuan 2 Representasi Digital Sinyal Multimedia
How Can I Be A Driver of The Month as I Am Working for Uber?
Simultaneous Linear Equations
Teknik. Pemrog. Terstruktur 2
Algoritma & Pemrograman 1 Achmad Fitro The Power of PowerPoint – thepopp.com Chapter 4.
Lesson 2-1 Conditional Statements 1 Lesson 2-1 Conditional Statements.
Draw a picture that shows where the knife, fork, spoon, and napkin are placed in a table setting.
Wednesday/ September,  There are lots of problems with trade ◦ There may be some ways that some governments can make things better by intervening.
Transcript presentasi:

Agung Toto Wibowo Universitas Telkom Basic Techniques Agung Toto Wibowo Universitas Telkom

Outline Information, and Entropy Compresion Performance Run Length Encoding Move to Front Scalar Quantization Prefix Code

NILAI INFORMASI Pada suatu eksperimen probabilistik dengan suatu random variable diskrit S. S = { s1, s2, …., sN} Jumlah informasi yang diproduksi oleh kejadian sk adalah : Jika pk = 1 (kejadian pasti terjadi)  maka I(sk) = 0  Apabila suatu kejadian sudah pasti akan terjadi, maka nilai informasinya = 0 Sifat nilai informasi : I(sk)  0 0  pk  1 I(sk)  I(si) pk < pi Suatu kejadian yang mempunyai nilai kemungkin terjadi lebih kecil maka nilai informasinya akan lebih besar apabila kejadian tersebut terjadi.

NILAI INFORMASI Jika sk dan si independent, maka : I(sksi) =I(sk) + I(si) Bilangan dasar yang digunakan untuk menghitung nilai informasi pada persamaan di atas bisa bermacam-macam. Untuk sistem digital yang menggunakan bilangan biner, maka dipakai bilangan dasar 2.

H = - (p log2 (p) + (1 – p) log2 (1 – p) ENTROPI Entropi H adalah nilai informasi rata-rata per simbol untuk keluaran dari suatu sumber informasi tertentu. H = E [I(sk)] bit/ simbol Untuk sinyal biner (N=2)dengan probabilitas kemunculan p dan (1 – p), maka: H = - (p log2 (p) + (1 – p) log2 (1 – p)

ENTROPI Beberapa catatan untuk entropi : bit pada satuan informasi untuk biner (Bilangan dasar 2) tidak sama dengan binary digit Entropi berkonotasi pada ketidakpastian. Nilai entropi akan maksimum apabila kejadian yang keluar semakin tidak pasti. Contoh : untuk pelemparan uang logam dengan probabilitas yang sama (0,5), maka kejadian yang keluar akan susah ditebak (tidak pasti), sehingga nilai entropinya maksimum.

ENTROPI 2 KEJADIAN Untuk sinyal biner dengan probabilitas kemunculan p dan 1 – p, maka :

ENTROPI : CONTOH Contoh : Hitung nilai entropi (nilai informasi rata-rata) dalam bit / karakter untuk abjad latin (26) huruf apabila : a. probabilitas kemunculan tiap huruf sama b. probabilitas kemunculan terdistribusi sebagai berikut : p = 0,10 untuk huruf a, e, o, t p = 0,07 untuk huruf h, i, n, r, s p = 0,02 untuk huruf c, d, f,l, m, p, u, y p = 0,01 untuk huruf lainnya Jawab : H = - (4 x 0,1 log2 0,1 + 5 x 0,07 log2 0,07 + 9 x 0,02 log2 0,02 + 8 x 0,01 log2 0,01 ) = 4,17 bit / karakter bit/ karakater

Performance – Compression Ratio Compression Ratio also called as bpb (bit per bit) is equal to the number of bit in compressed streem needed, on average to compress one bit in the input stream. Vallue 0.6 means that data occupies 60% of its original size after compression. Values greater than 1 mean the output stream bigger than the input stream. Another term like bpp (bits per pixel), bpc (bis per character), general term is bit rate.

Performance – Compression Factor The invers of compression ratio is called the compression fator Value greater than 1 indicates compression, and Value less than 1 indicates expansion.

Run Length Encoding Idea : If a data item d occurs n tconsecutive times in the input stream, replace the n occurances with the single pair nd This approach is called Run Length Encoding (LRE)

RLE Text Compression “2._all_is_too_well” ecode into “2._a2_is_t2_we2” is not work. Decompressor should distinct first 2 is part of the text, while the others repetition factor of “o” and “l” “2._a2l_is_t2o_we2l” does not solve the problem. Still have the same length, and the problem still exist. “2._a@2l_is_t@2o_we@2l” a special escape generating longer text even it can solve the problem

Simple RLE Compression charCount = 0, repeatCount = 0 Read next character (CH) While not end of string do increment charCount if charCount = 1 than savedCharacter  CH else if savedCharacter = CH than increment repeatCount else if repeatCount < 4 than write savedCharacter repeatCount times repeatCount  0, savedCharacter  CH else write compressed format (3 char)

Simple RLE Decompression Repeat compressionFlag  off While compressionFlag = off and not EOS do Read next Character (CH) if CH = ‘@’ then compressionFlag  on else write CH on output stream if compressionFlag = on than read nRepetition, read dChar generate nRepetition of dChar Until end of string

RLE on Binary Image RLE compression in image based on observation : if we select a pixel in the image at random position, there is a good chance that its neighbours will have the same color. E.g if bitmap start with white pixels, folowed by 1 black one, followed 55 white pixel, than only the number 17, 1, 55,... needs be written on the output.

RLE on Gray Scale Image Encoded using pair (run length, pixel values) that up to 255 run. E.g 12, 12, 12, 12, 12, 12, 12, 12, 12, 35, 76, 112, 67, 87, 87, 87, 5, 5, 5, 5, 5, 5, 1, ... Compressed into 9, 12, 35, 76, 112, 67, 3, 87, 6, 5, 1, ... (problem to distinguish count and graysale value?) If image limited only 128 grayscale value, devote 1 bit to indicate grayscale value or count. If graysccale is 256, than can be reduced into 255 color, 1 byte reserved as flag (e.g 255) to indicate count. 255, 9, 12, 35, 76, 112, 67, 255, 3, 87, 255, 6, 5, 1, ... One bit can devoted to each byte wheather the byte is count or grayscale data. 10000010, 9, 12, 35, 76, 112, 67, 255, 3, 87, 255, 100....., 6, 5, 1, ...

RLE on True Color Image On Colored Images each pixel stored on tree bytes (representing RGB value), each color should be encoded separately. E.g (171, 85, 34), (172, 85, 35), (172, 85, 30), (173, 85, 33), should be separate into (171, 172, 172, 173, ...), (85, 85, 85, 85, ...) and (34, 35, 30, 33, ...) It is preferable to encode each row individually, thu a row ends with four pixel of intensity 87 and followed by 9 pixel of intensity 87 better to write with ..., 4, 87, 9, 87,.... Or even better to write with ..., 4, 87, eol, 9, 87,....

Exercises - RLE Compression Ratio??? 1 2 3 4 5 6 7 8 9 10 24 14 19 23 21 13 20

Relative Encoding Another technique called Differencing is used when the data to be compress consist of number that do not differ by much, (or similar) E.g Telemetry, Temperature sensing (70, 71, 72, 72.5, 73.1,...) can be express (70, 1, 1, 0.5, 0.6, ...) Note difference can be minus e.g (110, 115, 121, 119, 200, 202, ...) can be compressed into (110, 5, 6, -2, 200, 2, ...) To distinguish between data and difference, is done using the extra bits. E.g (110, 5, 6, -2, 200, 2, ...) is sent with following extra (100010002) bits.

Difference coding : Example

Difference coding : Example

Move to Front Coding [1] Idea : maintain he alphabet A of symbols as a list where frequently occuring symbols are located near the front. E.g list of alphabet A = (“a”, “b”, “c”, “d”, “m”, “n”, “o”, “p”) Input stream “abcddcbamnopponm” is encoded as C = (0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3), and without move to front encoded as C’ = (0, 1, 2, 3, 3, 2, 1, 0, 4, 5, 6, 7, 7, 6, 5, 4).

Move to Front Coding [2] String “abcddcbamnopponm” and “abcdmnopabcdmnop”

Move to Front Coding [3] Move to front can be combined with the other method (hufman or arithmetics) by following steps : Assign Huffman Codes to Integer in range [0,n] such that smaller integers get shorter codes. E.g : 0-0, 1-10, 2-110, 3-1110, 4 – 11110, 5 – 111110, 6 – 1111110, 7 – 1111111. Assign codes to the integers such that the code of integer i ≥ 1 is binary codes proceded by [log 2 i], see picture Use Adaptive huffman coding For maximum compression, perform two passes over C, the first pass counting trequencies of codes, and the second pass performing the actual encoding.

Move To Front Coding [4] Move-ahead-k. The element of A match by the current symbol is move ahead k positions instead of all the way to the front of A. Parameter k is specified by user. If k = n than it is equal to move to front. Wait-c-and-move. An element of A is moved to the front only after it has been matched c times to symbols from the input stream. It may make sense to threat each word not character.

Exercises – Move To Front Perhatikan lirik berikut : Selamat Hari Raya selamat hari lebaran Selamat Idul Fitri maaf lahir dan batin Selamat Hari Raya selamat lebaran Selamat Idul Fitri maaf lahir dan batin Jika buffer yang dimiliki adalah 6kata, berapa compression rationya?

Scalar Quantization Reduce the number of data Lossy compression If the data in the form of large number, it is converted to smaller number. Not all data is used If the data to be compressed is analog, quantization is used to sample and digitized it into small number. The smaller the number, the better the compression, But also the greater the loss of information

Scalar Quantization - Example Data 8 bit  delete the LSB bit  data 7 bit Input data [0,255]  just take quantized value 0, s, 2s, …., ks where (k+1)s < 255 s = 3  output data : 0,3,6,9, 12, … , 255 s = 4  output data : 0, 4, 8, 12, …, 252, 255 PCM (Puse Code Modulation) in voice voice 4 kHz (analog) is sampled 8000 sample/s and encode to 8 bit per sample = 64 kbps

Statistical Method Using variable size code  shorter code assigned to symbol that appear more often (have a higher probability of occurrence) Example : Morse, Huffman code, etc

Fixed Length Code Each symbol is represented as fixed length code Example : ASCII code  code length : 7 bit + 1 bit parity = 8 bit Total bit number = number of characters * 8 bit

Variable Size Code Assigning Code that can be decoded unambiguously Assigning code with the minimum average size Example : Entropy 1.57 Bit per symbol = 1.77 If the data have equal probability (0.25)  entropy = bit/symbol = 2 bit Symbol Probability Code 1 Code 2 A1 0.49 1 A2 0.25 01 A3 010 000 A4 0.01 001

Prefix Code (= Prefix-free Code) A prefix code is a type of code system (typically a variable-length code) distinguished by its possession of the "prefix property"; which states that there is no valid code word in the system that is a prefix (start) of any other valid code word in the set. a receiver can identify each word without requiring a special marker between words.

Prefix Code : Example 1 A code with code words {9, 59, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does not, because "5" is a prefix of both "59" and "55". A prefix code is an example of a uniquely decodable code:

Binary Prefix Code Dapat direpresentasikan dalam pohon biner Ciri khas: setiap simbol akan menjadi leaves nodes, tidak ada yang menjadi internal nodes

Prefix Code : Example 2 Prefix-free Code {01, 10, 11, 000, 001} Jika ni = banyaknya codeword yang memiliki panjang bit i, maka: n2 = 3 (pada level 2, ada 3 codeword) n3 = 2 (pada level 3 ada 2 codeword

Prefix Code : Example 3 Code {0, 01, 011, 0111} Code {0, 01, 11} Bukan prefix-free code, Code {0, 01, 11}

Exercise - Prefix Code Berapa Entropy dan bit rate apabila string : “aebbcchhffabbacdaaaaaffghbbfff” dikodekan menggunakan prefix code berikut. Jika 1 symbol 1 byte, berapa compression rationya?

ASSIGNMENT #1 Make a description paper (and complete it with algorithm) about Tunstall Code and Golomb Code in Bahasa Indonesia.

Daftar Pustaka [1] Solomon, David; “Data Compression 3rd Edition”, Springer, 2004