Bayesian: Single Parameter

Slides:



Advertisements
Presentasi serupa
UKURAN NILAI PUSAT UKURAN NILAI PUSAT ADALAH UKURAN YG DAPAT MEWAKILI DATA SECARA KESELURUHAN JENIS UKURAN NILAI PUSAT : MEAN , MEDIAN, MODUS KUARTIL,
Advertisements

Teori Graf.
Statistika Deskriptif: Distribusi Proporsi
Bab 4 Basic Probability Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.
Wido Hanggoro ` Research and Development Department Indonesia Meteorological Climatological and Geophysical Agency.
Pertemuan II SEBARAN PEUBAH ACAK
BAB 8 Estimasi Interval Kepercayaan
Bulan maret 2012, nilai pewarnaan :
Tugas: Perangkat Keras Komputer Versi:1.0.0 Materi: Installing Windows 98 Penyaji: Zulkarnaen NS 1.

di Matematika SMA Kelas XI Sem 1 Program IPS
BADAN KOORDINASI KELUARGA BERENCANA NASIONAL DIREKTORAT PELAPORAN DAN STATISTIK DISAJIKAN PADA RADALGRAM JAKARTA, 4 AGUSTUS 2009.
Pengujian Hipotesis.
Pendugaan Parameter.
BOROBUDUR (4) FAHMI BASYA
Statistika Deskriptif
Bab 6B Distribusi Probabilitas Pensampelan
LIMIT FUNGSI LIMIT FUNGSI ALJABAR.
HITUNG INTEGRAL INTEGRAL TAK TENTU.
DISTRIBUSI FREKUENSI oleh Ratu Ilma Indra Putri. DEFINISI Pengelompokkan data menjadi tabulasi data dengan memakai kelas- kelas data dan dikaitkan dengan.
Rabu 23 Maret 2011Matematika Teknik 2 Pu Barisan Barisan Tak Hingga Kekonvergenan barisan tak hingga Sifat – sifat barisan Barisan Monoton.
Pengujian Hipotesis Parametrik 2
NILAI RATA-RATA (CENTRAL TENDENCY)
UKURAN PEMUSATAN DATA Sub Judul.
Fungsi Invers, Eksponensial, Logaritma, dan Trigonometri
PENGUKURAN GEJALA PUSAT / NILAI PUSAT/UKURAN RATA-RATA
Bab 16 Sekor Komposit dan Seleksi Sekor Komposi dan Seleksi
Bulan FEBRUARI 2012, nilai pewarnaan :
AREAL PARKIR PEMERINTAH KABUPATEN JEMBRANA
KINERJA SAMPAI DENGAN BULAN AGUSTUS 2013
PENGUJIAN HIPOTESA Probo Hardini stapro.
BUDIYONO Program Pascasarjana UNS
Graf.
Statistika Deskriptif: Statistik Sampel
SAMPLING DAN DISTRIBUSI SAMPLING
DISTRIBUSI FREKUENSI.
Bayesian: Multi-Parameter Model
PERSAMAAN DIFERENSIAL (DIFFERENTIAL EQUATION)
Statistika Deskriptif: Distribusi Proporsi
Dasar probabilitas.
Nilai Ujian Statistik 80 orang mahasiswa Fapet UNHAS adalah sebagai berikut:
Teknik Numeris (Numerical Technique)
• Perwakilan BKKBN Provinsi Sulawesi Tengah•
BAB2 QUEUE 6.3 & 7.3 NESTED LOOP.
Bab 7 Nilai Acuan Norma.
Korelasi dan Regresi Ganda
DISTRIBUSI PELUANG Pertemuan ke 5.
Regresi linier sederhana
Pengantar sistem informasi Rahma dhania salamah msp.
Common Effect Model.
Pertemuan 05 Sebaran Peubah Acak Diskrit
Ruang Contoh dan Peluang Pertemuan 05
Pendugaan Parameter Proporsi dan Varians (Ragam) Pertemuan 14 Matakuliah: L0104 / Statistika Psikologi Tahun : 2008.
1 Pertemuan 10 Statistical Reasoning Matakuliah: T0264/Inteligensia Semu Tahun: Juli 2006 Versi: 2/1.
1 Pertemuan 10 Fungsi Kepekatan Khusus Matakuliah: I0134 – Metode Statistika Tahun: 2007.
Pertemuan 07 Peluang Beberapa Sebaran Khusus Peubah Acak Kontinu
Dr. Nur Aini Masruroh Deterministic mathematical modeling.
Expectation Maximization. Coin flipping experiment  Diberikan koin A dan B dengan nilai bias A dan B yang belum diketahui  Koin A akan memunculkan head.
Chapter 5 Discrete Random Variables and Probability Distributions Statistika.
PROBABILITY DISTRIBUTION
DISTRIBUSI BINOMIAL.
Thinking about Instrumental Variables (IV) Christopher A. Sims (2001)
Statistika Chapter 4 Probability.
Pengujian Hipotesis (I) Pertemuan 11
Matakuliah : I0014 / Biostatistika Tahun : 2005 Versi : V1 / R1
DISTRIBUSI BINOMIAL.
PENDUGAAN PARAMETER Pertemuan 8
Pendugaan Parameter (II) Pertemuan 10
Fungsi Kepekatan Peluang Khusus Pertemuan 10
Pertemuan 21 dan 22 Analisis Regresi dan Korelasi Sederhana
Transcript presentasi:

Bayesian: Single Parameter Prof. Nur Iriawan, PhD. Statistika – FMIPA – ITS, SURABAYA 21 Februari 2006

Frequentist Vs Bayesian (Casella dan Berger, 1987) Grup Frequentist Grup yang mendasarkan diri pada cara klasik: MLE, Moment, UMVUE, MSE, dll Pendekatan analitis selalu sebagai solusi Grup Bayesian Grup yang mendasarkan diri pada cara Bayesian Pendekatan numerik serta komputasi secara intensif Inference lebih didasarkan pada kemungkinan muncul terbesar Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 2

Teorema Bayes (Thomas Bayes, 1702-1761) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 3

Model Bayesian (Box dan Tiao, 1973), (Zellner, 1971), (Gelman, Stern, Carlin, dan Rubin, 1995) Mengacu pada bentuk proporsional Yang dibentuk sebagai Bahwa data yang dibentuk sebagai likelihood digunakan sebagai bahan untuk meng-update informasi prior menjadi sebuah informasi posterior yang siap untuk digunakan sebagai bahan inferensi. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 4

Bayesian: Parameter juga diperlakukan sebagai variabel Dalam Bayesian semua parameter dalam model diperlakukan sebagai variabel Prinsip berfikir sebagai bentuk Full Conditional Distribution digunakan untuk mempelajari karakteristik setiap parameter Dibedakan antara simbol penyajian likelihood data dan Full Conditional Distribution. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 5

Motivasi Bayesian Theorema Bayes Thomas Bayes Pada bentuk lain jika adalah suatu r.v yang independen dengan θ adalah parameternya, maka P(B) adalah konstan Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 6

Example: the Icy Road Case Ice: Is there an icy road? Values {Yes, No} Initial Probabilities (.7, .3) Watson: Does Watson have a car crash? Probabilities (.8, .2) if Ice=Yes, (.1, .9) if Ice=No. How do we reflect changes in our belief as we make observations? These next couple of slides are are brief primer on Bayes nets as they can be used in assessment. In this small example, we have just one student-model variable, “Level of Proficiency.” It has two levels, expert and novice, that we can’t see directly. What we can see is an examinee take a patient history, and we can determine whether it was adequate or inadequate. . Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 7

Icy Road: Conditional Probabilities Watson Yes No Ice Yes .8 .2 Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re No .1 .9 p(Watson=no|ice=yes) p(Watson=yes|Ice=yes) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 8

Icy Road: Likelihoods Note: 8/1 ratio p(Watson=yes|Ice=yes) .8 .2 Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re No .1 .9 p(Watson=yes|Ice=no) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 9

Icy Road: Bayes Theorem: If Watson = yes -- Before Normalizing Prior * Likelihood µ Posterior Watson .8 .9 .2 .1 No Yes Ice .56 .03 Yes No Ice .7 .3 No Ice Yes Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re Sum = .59. Need to divide through by this ‘normalizing constant’ to get probabilities. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 10

Icy Road: Bayes Theorem: If Watson = yes Prior * Likelihood µ Posterior Watson .8 .9 .2 .1 No Yes Ice .56 .03 Yes No Ice .95 .05 Yes No Ice .7 .3 No Ice Yes Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re Posterior probabilities -- each term in the product divided through by the normalizing constant .59. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 11

Contoh pada kasus Normal Representasi alami suatu distribusi Normal(μ,σ2) atau N(μ,σ2) ? Mana representasi yang representatif ? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 12

? Apa perbedaan antara penyajian berikut ini? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 13

Plot variabel x, μ dan σ dalam full conditional Normal Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 14

Interval vs Highest Posterior Density (HPD) (Box dan Tiao, 1973),(Gelman et.al, 1995), (Iriawan, 2001) Pembentukan interval konfidensi pada frequentist adalah sbb Pembentukan interval konfidensi pada Bayesian didekati dengan HPD. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 15

Representasi Kesamaan Densitas (Iriawan, 2001) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 16

Compromise dalam Control Chart Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 17

HPD pada Control Chart Individu Peta Kendali (1-) x 100% Batas Kendali Bawah Batas Kendali Atas 95,0 71,3953 109,481 97,5 64,4857 110,915 99,0 55,3356 112,775 Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 18

Contoh Kasus pada Bernoulli Seperti halnya pada Normal sebelumnya, x~Ber(x;p) disajikan sbb: dimana pada frequentist, p dianggap konstan Bagaimana jika karena situasi dan tempat pengamatan yang berbeda dan diperoleh p berubah-ubah? Prinsip Bayesian, p akan diperlakukan menjadi sebuah variabel agar mempunyai kemampuan akomodatif pada keadaan seperti di atas. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 19

Anggap p berubah sesuai dengan distribusi Beta(α,β), seperti berikut: apa yang akan terjadi? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 20

Anggap satu pengamatan bernoulli telah dilakukan, maka posterior distribusinya adalah sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 21

Sesuai dengan spesifikasi fungsi Beta, maka penyebut dapat diproses sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 22

Sehingga distribusi posterior untuk p setelah satu observasi tersebut adalah Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 23

Estimator Bayes Bayesian estimate dari p dapat diperoleh dengan meminimumkan loss function. Beberapa loss functions dapat digunakan, tetapi disini akan digunakan quadratic loss function yang konsisten dengan mean square errors (MSE) Secara umum, estimasi θ dengan pendekatan Bayes sbb ((Carlin and Louis, 1996), and (Elfessi and Reineke, 2001)) : Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 24

Dengan memperlakukan expektasi pada posterior distribution diperoleh Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 25

Seperti sebelumnya, diselesaikan integral tersebut dengan membuat variabel baru a*=a+x+1 dan b*=b-x+1. Integralnya akan memberikan hasil sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 26

Dengan menggunakan penyederhanaan seperti berikut Maka, Atau Ingat hasil ini kembali pada saat pembahaan Compromising Bayesian dengan Classical Approaches Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 27

Pengembangan hasil ini ke bentuk n buah percobaan Bernoulli akan menghasilkan sebanyak y sukses memberikan hasil Dimana y adalah jumlah sukses dari observasi setiap bernoulli x. Nilai taksiran y adalah sebagai berikut: Ingat hasil ini kembali pada saat pembahaan Compromising Bayesian dengan Classical Approaches Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 28

Prior dan Metode Bayesian (Gelman et.al, 1995) Karena parameter  diperlakukan sebagai variabel maka dalam Bayesian  akan mempunyai nilai dalam domain , dengan densitas f (). Dan densitas inilah yang akan dinamakan sebagai distribusi prior dari  . Dengan adanya informasi prior yang dipadukan dengan data / informasi saat itu, X, yang digunakan dalam membentuk posterior  , maka penghitungan posteriornya akan semakin mudah, yaitu hanya dengan menghitung densitas bersyarat dari  diberikan oleh X=x . Kritikan pada Bayesian biasanya terfokus pada “legitimacy dan desirability” untuk menggunakan  sebagai random variabel dan ketepatan mendefinisikan/memilih distribusi prior-nya. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 29

Bentuk Prior, Likelihood, dan Posterior yang ideal Proper/ conjugate Posterior Prior θ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 30

Bagaimana jika pemilihan priornya seperti berikut ini? Pemilihan prior seperti ini akan Merupakan sebuah misleading prior, Sehingga posteriornya tidak akan Jelas bentuknya. ? Likelihood Posterior Prior θ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 31

Prior yang serba sama densitasnya di semua domain Likelihood improper posterior prior θ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 32

Interpretasi distribusi Prior Sebagai bentuk distribusi frequency Sebagai bentuk representasi normatif dan objectif pada suatu parameter yang lebih rasional untuk dipercayai Sebagai suatu representasi subjectifitas seseorang dalam memandang sebuah parameter menurut penilainnya sendiri Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 33

Prior sebagai representasi Frequensi Distribusi Adakalanya nilai suatu parameter dibangkitkan dari modus pola data sebelumnya baik itu dari pola simetri ataupun tidak simetri Dalam sebuah inspeksi dalam proses industri, data kerusakan pada batch sebelumnya biasanya akan digunakan sebagai estimasi informasi prior untuk keadaan batch selanjutnya Prior biasanya mempunyai arti fisik sesuai dengan frequensi kejadian data-datanya Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 34

Interpretasi Normative/Objective dari suatu prior Permasalahan pokok agar prior dapat interpretatif adalah bagaimana memilih distribusi prior untuk suatu parameter yang tidak diketahui namun sesuai dengan permasalahan fisik yang ada. Jika  hanya mempunyai nilai-nilai pada range yang tertentu saja, hal ini cukup beralasan jika digunakan prior yang mempunyai densitas serba sama (equally likelly / uniformly distributed). Interpretasinya adalah bahwa setiap kondisi diberi kesempatan yang sama untuk dapat terpilih sebagai suporter likelihood dalam membentuk posteriornya. Prior dapat mempunyai arti yang sangat janggal jika salah dalam pemilihannya Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 35

Kasus prior dalam Continuous Parameters Invariance arguments. Hal ini akan dapat terjadi, sebagai contoh dalam kasus Normal mean m, dapat diartikan bahwa semua titik dalam semua interval (a,a+h) harus mempunyai probabilitas prior untuk semua h dan a yang diketahui. Hal ini akan memberikan pengertian bahwa untuk semua titik dalam interval tersebut mempunyai kesempatan sama terpilih atau cenderung mempunyai uniform prior (“improper prior”) Untuk parameter, s, dalam interval (a,ka) akan mempunyai prior probabilitas yang sama, yang hal ini akan memberikan arti bahwa priornya akan proportional pada nilai 1/ s. Lagi-lagi hal ini juga menghasilkan sebuah improper prior. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 36

Macam-macam Prior Conjugate prior vs non-conjugate prior ((Box dan Tiao, 1973),(Gelman et.al, 1995), (Tanner, 1996), (Zellner, 1971)) Prior terkait dengan pola model likelihood datanya Proper prior vs Improper prior (Jeffreys prior) Prior terkait pada pemberian pembobotan/ densitas di setiap titik, uniformly distributed atau tidak Informative prior vs Non-Informative prior Prior terkait dengan sudah diketahui pola/frekuensi distribusi dari datanya atau belum Pseudo-prior (Carlin dan Chib, 1995) Prior terkait dengan pemberian nilainya yang disetarakan dengan hasil elaborasi dari frequentist (misal regresi dengan OLS) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 37

Continuous Parameters Biasanya digunakan uniform prior (at least if the parameter space is of finite extent) Tetapi jika  adalah uniform, maka suatu bentuk fungsi non-linear dari , g(), tidak akan uniform Contoh jika p()=1, >0. Re-parameterisasi sebagai maka: dimana sehingga: “ignorance about ” does not imply “ignorance about g.” The notion of prior “ignorance” may be untenable (mungkin dapat diperbolehkan)? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 38

Turning this process around slightly, Bayesian analysis assumes that we can make some kind of probability statement about parameters before we start. The sample is then used to update our prior distribution. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 39

Pertama, anggap bahwa prior yang digunakan dapat direpresentasikan sebagai probability density function p(q) dengan q adalah parameter yang akan dipelajari. Berdasarkan pada sampel X (likelihood function) kita akan dapat meng-update distribusi priornya mengguankan Bayes rule Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 40

Beberapa Conjugate priors Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 41

The Jeffreys Prior (single parameter) Jeffreys prior diberikan sebagai berikut: dimana adalah expected Fisher Information This is invariant to transformation in the sense that all parametrizations lead to the same prior Can also argue that it is uniform for a parametrization where the likelihood is completely determined (see Box and Tiao, 1973, Section 1.3) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 42

Contoh Jeffreys pada Binomial Hasil ini adalah suatu bentuk distribusi beta dengan parameters ½ and ½ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 43

Contoh Jeffreys’ Priors yang lain Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 44

Improper Priors  Trouble Posterior (sometimes) Suppose Y1, .,Yn are independently normally distributed with constant variance s2 and with: Suppose it is known that r is in [0,1], r is uniform on [0,1], and g, b, and s have improper priors Then for any observations y, the marginal posterior density of r is proportional to where h is bounded and has no zeroes in [0,1]. This posterior is an improper distribution on [0,1]! Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 45

Improper prior usually  proper posterior Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 46

Contoh lain: improper proper Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 47

Subjective Degrees of Belief Probability represents a subjective degree of belief held by a particular person at a particular time Various techniques for eliciting subjective priors. For example, Good’s device of imaginary results. e.g. binomial experiment. beta prior with a=b. “Imagine” the experiment yields 1 tail and n-1 heads. How large should n be in order that we would just give odds of 2 to 1 in favor of a head occurring next? (eg n = 4 implies a=b=1) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 48

Problems with Subjectivity What if the prior and the likelihood disagree substantially? The subjective prior cannot be “wrong” but may be based on a misconception The model may be substantially wrong Often use hierarchical models in practice: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 49

Hierarchical Model Contoh pada kasus Binomial Beta(a, b) Poisson(λ) Gamma(c, d) Gamma(g, h) Gamma(e, f) Beta(a, b) Poisson(λ) Binomial(n, p) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 50

General Comments Determination of subjective priors is difficult Difficult to assess the usefulness of a subjective posterior Don’t be misled by the term of “subjective”; all data analyses involve appreciable personal elements Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 51

Once again: An example with a continuous variable: A beta-binomial example The setup: We are flipping a biased coin, where the probability of heads p could be anywhere between 0 and 1. We are interested in p. We will have two sources of information: Prior beliefs, which we will express as a beta distribution, and Data, which will come in the form of counts of heads in 10 independent flips. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 52

An example with a continuous variable: A beta-binomial example--the Prior Distribution Let’s suppose we think it is more likely that the coin is close to fair, so p is probably nearer to .5 than it is to either 0 or 1. We don’t have any reason to think it is biased toward either heads or tails, so we’ll want a prior distribution that is symmetric around .5. We’re not real sure about what p might be--say about as sure as only 6 observations. This corresponds to 3 pseudo-counts of H and 3 of T, which, if we want to use a beta distribution to express this belief, corresponds to beta(4,4): Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 53

An example with a continuous variable: A beta-binomial example--the Prior Distribution Beta. Defined on [0,1]. Conjugate prior for the probability parameter in Bernoulli & binomial models. p ~ dbeta(4,4) Mean(p): Variance(p): Mode(p): PseudoCount of successes PseudoCount of failures The variable: “success probability” The failure probability Shape, or “prior sample info” The success probability Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 54

An example with a continuous variable: A beta-binomial example--the Likelihood Next we will flip the coin ten times. Assuming the same true (but unknown to us) value of p is in effect for each of ten independent trials, we can use the binomial distribution to model the probability of getting any number of heads: i.e., Count of observed successes The variable Count of observed failures The “success probability” parameter The failure probability The success probability Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 55

An example with a continuous variable: A beta-binomial example--the Likelihood We flip the coin ten times, and observe 7 heads; i.e., r=7. The likelihood is obtained now using the same form as in the preceding slide, except now r is fixed at 7 and we are interested in the relative value of this function at different possible values of p: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 56

An example with a continuous variable: Obtaining the posterior by Bayes Theorem posterior likelihood prior General form: In our example, 7 plays the role of x*, and p plays the role of y. Before normalizing: After normalizing: Now, how can we get an idea of what this means we believe about p after combining our prior belief and our observations? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 57

An example with a continuous variable: In pictures Prior x Likelihood Posterior Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 58

An example with a continuous variable: Using the fact that we have conjugate distributions Now This is just the kernel of a beta(11,7) distribution. This is rather special. The data were observed in accordance with a probability function which would have that same mathematical form as a likelihood once data are observed. We chose a prior distribution (in this case, a beta distribution) which would combine with the likelihood just so as to produce another distribution in the same parametric family (another beta distribution), just with updated parameters. We can work out its summary statistics: Mean(p): Variance(p): Mode(p): prior was .5 .028 .5 Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 59

An example with a continuous variable: Using BUGS Now What BUGS does in this simple problem with one variable is to sample lots of values from the posterior distribution for p; that is, its distribution as determined first with information from the prior, but further conditional on the observed data. Here are the summary statistics from 50000 draws: Mean(p): Variance(p): Mode(p): prior was .5 .028 .5 .11162~.0125 Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 60

An example with a continuous variable: Using BUGS BUGS setup for this problem: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 61

Looking ahead to sampling-based approaches with many variables BUGS = Bayesian-inference Using Gibbs Sampling Basic idea: Model multi-parameter problem in terms of assemblies of distributions and functions for all data and all parameters (taking advantage of conditional dependence whenever possible). E.g., p(Data|x,y) p(x|z) p(y) p(z). (*) Observe Data*; Posterior p(x,y,z|Data*) is proportional to (*). Hard to evaluate normalizing constant, but ... Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 62

Looking ahead to sampling-based approaches with many variables Can draw values from “full conditional” distributions: Start with a possible value for each variable in cycle 0. In cycle t+1, Draw xt+1 from p(x|Y= yt,Z= zt,Data*) Draw yt+1 from p(y|X= xt+1,Z= zt,Data*) Draw zt+1 from p(z|X= xt+1,Y= yt+1,Data*) Under suitable conditions, these series of draws will come to approximate draws from the actual true joint posterior for all the parameters. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 63

Inference in a chain Recursive representation: p(u,v,x,y,z) = p(z|y,x,v,u) p(y|x,v,u) p(x|v,u) p(v|u) p(u) = p(z|y) p(y|x) p(x|v) p(v|u) p(u). U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 64

Start here, by revising belief about X Inference in a chain Suppose we learn the value of X: Start here, by revising belief about X U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 65

Inference in a chain Propagate information down the chain using conditional probabilities: From updated belief about X, use conditional probability to revise belief about Y U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 66

Inference in a chain Propagate information down the chain using conditional probabilities: From updated belief about Y, use conditional probability to revise belief about Z U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 67

Inference in a chain Propagate information up the chain using Bayes Theorem: From updated belief about X, use Bayes Theorem to revise belief about V U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 68

Inference in a chain Propagate information up the chain using Bayes Theorem: From updated belief about V, use Bayes Theorem to revise belief about U U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 69

Inference in singly-connected nets Singly connected: There is never more than one path from one variable to another variable. Chains and trees are singly connected. Can use repeated applications of Bayes theorem and conditional probability to propagate evidence. (Pearl, early 1980s) V U X Y Z Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 70

Posterior Summaries Mean, median, mode, percentile, etc. Central 95% interval versus highest posterior density region (normal mixture example…) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 71

Bayesian Confidence Intervals Apart from providing an alternative procedure for estimation, the Bayesian approach provides a direct procedure for the formulation of parameter confidence intervals. Returning to the simple case of a single coin toss, the probability density function of the estimator becomes: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 72

As previously discussed, try to give a=b=1 As previously discussed, try to give a=b=1.4968, the Bayesian estimator of P is .6252. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 73

Please verify this result! However, using the posterior distribution function, we can also compute the probability that the value of p is less than .5 given a head: Please verify this result! Hence, we have a very formal statement of confidence intervals as P(0.3 < p < 0.7). Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 74

Prediction “Posterior Predictive Density” of a future observation binomial example, n=20, x=12, a=1, b=1  ~ y y Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 75

Prediction for Univariate Normal Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 76

Prediction for Univariate Normal Posterior Predictive Distribution is Normal Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 77

Prediction for a Poisson Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 78

On the Compromise of Bayesian to Classical Estimation (presented on South-East Asia Stat & Math Muslim Society Conference) Nur Iriawan Statistics Department of Institut Teknologi Sepuluh Nopember Jl. Arief Rahman Hakim Sukolilo, Surabaya 60111, Indonesia iriawann@sby.centrin.net.id Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 79

Example on Exponential Suppose x is exponentially distributed The MLE of is Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 80

Using Bayesian approach with prior of is The likelihood would be Then the posterior of given the data X is Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 81

The Bayes estimator for can be derived using Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 82

Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 83

Numerical Calculation One thousand generated data from Exponential distribution, then The classical MLE give the result (using MINITAB) as follows Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 84

Using WinBUGS, the Bayes estimator is Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 85

Lihat kembali hasil dari Binomial Estimator Bayes diperoleh Cara klasik memberikan hasil bahwa Bagaimana jika α = β = 0? Estimator Bayes akan menjadi sama dengan cara klasik. Demikian halnya jika nilai-nilai ini diterapkan pada prior beta, maka prior tersebut akan berubah menjadi sebuah Jeffreys’ prior. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 86

Summary The Bayesian estimator reported as the posterior mean which is used here is generated from an improper prior distribution. It has been shown that when there is no information about the prior of the parameter of model, a constant or Jeffreys’ prior is used, the resulting estimator will give a compromise result between Bayesian and Classical estimator. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 87

Numerical Integration: Monte Carlo Method (Low dan Kelton, 2000) Anggap kita akan menghitung integral berikut Jika g(x) cukup kompleks maka nilai I akan cukup rumit. Dengan cara numerik seperti beriktu dapat diperoleh nilai I dengan cukup sederhana. Caranya adalah sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 88

Hitung ekspektasi Y dengan cara berikut Buat random variabel baru dengan x bernilai uniform dalam interval (a,b), atau U(a,b). Hitung ekspektasi Y dengan cara berikut Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 89

Sehingga nilai integral I dapat didekati secara numerik oleh Diketahui bahwa Sehingga nilai integral I dapat didekati secara numerik oleh Berarti, bangkitkan data yang mempunyai distribusi Uniform dan masukkan nilainya ke fungsi g(x) jumlahkan nilainya dan hitung rata-ratanya sebagai taksiran nilai integral yang sedang dicari. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 90

Berapa banyak data yang harus dibangkitkan? Data harus dibangkitkan sebanyak mungkin sampai nilai rata-ratanya mencapai titik konvergen. Burn-in Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 91

Cara lain menghitung nilai estimasi integral dengan RNG Macam Random Number Generator (RNG) Transformasi Invers Composisition Convolution Acceptance Rejection (AR) Adaptive Acceptence Rejection (AAR) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 92

Transformasi Invers Syarat Transformasi Invers Metodenya adalah sbb: Fungsi mempunyai CDF yang close form Metodenya adalah sbb: x u 1 F(x) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 93

Composition (Mixture form) Perhatikan bentuk fungsi berikut Half Normal Exponential I II f(x) Dimana data di daerah I dibangkitkan dengan Normal dan di daerah II dengan Exponential Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 94

Convolution Misalkan sebuah fungsi Erlang(m ), maka cara pembangkitan datanya adalah dengan mengkonvolusikan data bangkitan Exponential( ). Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 95

Acceptance Rejection (AR) Sangat bagus untuk fungsi yang tidak jelas pdf atau bukan Dapat mengakomodasikan fungsi yang tidak mempunyai CDF close form Caranya adalah sbb: tx f(x) Reject Accept rx Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 96

Algoritma AR Bangkitkan x ~ rx Bangkitkan u ~ U(0,1) If then Accept x Else Reject x Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 97