Bayesian: Single Parameter Prof. Nur Iriawan, PhD. Statistika – FMIPA – ITS, SURABAYA 21 Februari 2006
Frequentist Vs Bayesian (Casella dan Berger, 1987) Grup Frequentist Grup yang mendasarkan diri pada cara klasik: MLE, Moment, UMVUE, MSE, dll Pendekatan analitis selalu sebagai solusi Grup Bayesian Grup yang mendasarkan diri pada cara Bayesian Pendekatan numerik serta komputasi secara intensif Inference lebih didasarkan pada kemungkinan muncul terbesar Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 2
Teorema Bayes (Thomas Bayes, 1702-1761) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 3
Model Bayesian (Box dan Tiao, 1973), (Zellner, 1971), (Gelman, Stern, Carlin, dan Rubin, 1995) Mengacu pada bentuk proporsional Yang dibentuk sebagai Bahwa data yang dibentuk sebagai likelihood digunakan sebagai bahan untuk meng-update informasi prior menjadi sebuah informasi posterior yang siap untuk digunakan sebagai bahan inferensi. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 4
Bayesian: Parameter juga diperlakukan sebagai variabel Dalam Bayesian semua parameter dalam model diperlakukan sebagai variabel Prinsip berfikir sebagai bentuk Full Conditional Distribution digunakan untuk mempelajari karakteristik setiap parameter Dibedakan antara simbol penyajian likelihood data dan Full Conditional Distribution. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 5
Motivasi Bayesian Theorema Bayes Thomas Bayes Pada bentuk lain jika adalah suatu r.v yang independen dengan θ adalah parameternya, maka P(B) adalah konstan Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 6
Example: the Icy Road Case Ice: Is there an icy road? Values {Yes, No} Initial Probabilities (.7, .3) Watson: Does Watson have a car crash? Probabilities (.8, .2) if Ice=Yes, (.1, .9) if Ice=No. How do we reflect changes in our belief as we make observations? These next couple of slides are are brief primer on Bayes nets as they can be used in assessment. In this small example, we have just one student-model variable, “Level of Proficiency.” It has two levels, expert and novice, that we can’t see directly. What we can see is an examinee take a patient history, and we can determine whether it was adequate or inadequate. . Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 7
Icy Road: Conditional Probabilities Watson Yes No Ice Yes .8 .2 Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re No .1 .9 p(Watson=no|ice=yes) p(Watson=yes|Ice=yes) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 8
Icy Road: Likelihoods Note: 8/1 ratio p(Watson=yes|Ice=yes) .8 .2 Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re No .1 .9 p(Watson=yes|Ice=no) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 9
Icy Road: Bayes Theorem: If Watson = yes -- Before Normalizing Prior * Likelihood µ Posterior Watson .8 .9 .2 .1 No Yes Ice .56 .03 Yes No Ice .7 .3 No Ice Yes Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re Sum = .59. Need to divide through by this ‘normalizing constant’ to get probabilities. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 10
Icy Road: Bayes Theorem: If Watson = yes Prior * Likelihood µ Posterior Watson .8 .9 .2 .1 No Yes Ice .56 .03 Yes No Ice .95 .05 Yes No Ice .7 .3 No Ice Yes Here are the conditional probabilities for an expert. In applied work, for a specific inferential problem, structure and conditional probabilities may be taken as given. Where do conditional probabilities come from? We’ll talk about this a little later. For this example, we can think of them as coming from an expert where we observed a large group of people known to be experts, and another large group of people known to be novices, take patient histories in this particular setting. We observed that 80% of the experts took adequate histories. This figures captures reasoning from proficiency to observable. If we we going to observe another expert solution, then absent other information we’d put an 80% probability on observing them take an adequate history. Similarly, we saw only 40% of the novices take adequate histories. Re Posterior probabilities -- each term in the product divided through by the normalizing constant .59. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 11
Contoh pada kasus Normal Representasi alami suatu distribusi Normal(μ,σ2) atau N(μ,σ2) ? Mana representasi yang representatif ? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 12
? Apa perbedaan antara penyajian berikut ini? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 13
Plot variabel x, μ dan σ dalam full conditional Normal Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 14
Interval vs Highest Posterior Density (HPD) (Box dan Tiao, 1973),(Gelman et.al, 1995), (Iriawan, 2001) Pembentukan interval konfidensi pada frequentist adalah sbb Pembentukan interval konfidensi pada Bayesian didekati dengan HPD. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 15
Representasi Kesamaan Densitas (Iriawan, 2001) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 16
Compromise dalam Control Chart Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 17
HPD pada Control Chart Individu Peta Kendali (1-) x 100% Batas Kendali Bawah Batas Kendali Atas 95,0 71,3953 109,481 97,5 64,4857 110,915 99,0 55,3356 112,775 Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 18
Contoh Kasus pada Bernoulli Seperti halnya pada Normal sebelumnya, x~Ber(x;p) disajikan sbb: dimana pada frequentist, p dianggap konstan Bagaimana jika karena situasi dan tempat pengamatan yang berbeda dan diperoleh p berubah-ubah? Prinsip Bayesian, p akan diperlakukan menjadi sebuah variabel agar mempunyai kemampuan akomodatif pada keadaan seperti di atas. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 19
Anggap p berubah sesuai dengan distribusi Beta(α,β), seperti berikut: apa yang akan terjadi? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 20
Anggap satu pengamatan bernoulli telah dilakukan, maka posterior distribusinya adalah sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 21
Sesuai dengan spesifikasi fungsi Beta, maka penyebut dapat diproses sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 22
Sehingga distribusi posterior untuk p setelah satu observasi tersebut adalah Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 23
Estimator Bayes Bayesian estimate dari p dapat diperoleh dengan meminimumkan loss function. Beberapa loss functions dapat digunakan, tetapi disini akan digunakan quadratic loss function yang konsisten dengan mean square errors (MSE) Secara umum, estimasi θ dengan pendekatan Bayes sbb ((Carlin and Louis, 1996), and (Elfessi and Reineke, 2001)) : Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 24
Dengan memperlakukan expektasi pada posterior distribution diperoleh Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 25
Seperti sebelumnya, diselesaikan integral tersebut dengan membuat variabel baru a*=a+x+1 dan b*=b-x+1. Integralnya akan memberikan hasil sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 26
Dengan menggunakan penyederhanaan seperti berikut Maka, Atau Ingat hasil ini kembali pada saat pembahaan Compromising Bayesian dengan Classical Approaches Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 27
Pengembangan hasil ini ke bentuk n buah percobaan Bernoulli akan menghasilkan sebanyak y sukses memberikan hasil Dimana y adalah jumlah sukses dari observasi setiap bernoulli x. Nilai taksiran y adalah sebagai berikut: Ingat hasil ini kembali pada saat pembahaan Compromising Bayesian dengan Classical Approaches Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 28
Prior dan Metode Bayesian (Gelman et.al, 1995) Karena parameter diperlakukan sebagai variabel maka dalam Bayesian akan mempunyai nilai dalam domain , dengan densitas f (). Dan densitas inilah yang akan dinamakan sebagai distribusi prior dari . Dengan adanya informasi prior yang dipadukan dengan data / informasi saat itu, X, yang digunakan dalam membentuk posterior , maka penghitungan posteriornya akan semakin mudah, yaitu hanya dengan menghitung densitas bersyarat dari diberikan oleh X=x . Kritikan pada Bayesian biasanya terfokus pada “legitimacy dan desirability” untuk menggunakan sebagai random variabel dan ketepatan mendefinisikan/memilih distribusi prior-nya. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 29
Bentuk Prior, Likelihood, dan Posterior yang ideal Proper/ conjugate Posterior Prior θ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 30
Bagaimana jika pemilihan priornya seperti berikut ini? Pemilihan prior seperti ini akan Merupakan sebuah misleading prior, Sehingga posteriornya tidak akan Jelas bentuknya. ? Likelihood Posterior Prior θ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 31
Prior yang serba sama densitasnya di semua domain Likelihood improper posterior prior θ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 32
Interpretasi distribusi Prior Sebagai bentuk distribusi frequency Sebagai bentuk representasi normatif dan objectif pada suatu parameter yang lebih rasional untuk dipercayai Sebagai suatu representasi subjectifitas seseorang dalam memandang sebuah parameter menurut penilainnya sendiri Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 33
Prior sebagai representasi Frequensi Distribusi Adakalanya nilai suatu parameter dibangkitkan dari modus pola data sebelumnya baik itu dari pola simetri ataupun tidak simetri Dalam sebuah inspeksi dalam proses industri, data kerusakan pada batch sebelumnya biasanya akan digunakan sebagai estimasi informasi prior untuk keadaan batch selanjutnya Prior biasanya mempunyai arti fisik sesuai dengan frequensi kejadian data-datanya Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 34
Interpretasi Normative/Objective dari suatu prior Permasalahan pokok agar prior dapat interpretatif adalah bagaimana memilih distribusi prior untuk suatu parameter yang tidak diketahui namun sesuai dengan permasalahan fisik yang ada. Jika hanya mempunyai nilai-nilai pada range yang tertentu saja, hal ini cukup beralasan jika digunakan prior yang mempunyai densitas serba sama (equally likelly / uniformly distributed). Interpretasinya adalah bahwa setiap kondisi diberi kesempatan yang sama untuk dapat terpilih sebagai suporter likelihood dalam membentuk posteriornya. Prior dapat mempunyai arti yang sangat janggal jika salah dalam pemilihannya Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 35
Kasus prior dalam Continuous Parameters Invariance arguments. Hal ini akan dapat terjadi, sebagai contoh dalam kasus Normal mean m, dapat diartikan bahwa semua titik dalam semua interval (a,a+h) harus mempunyai probabilitas prior untuk semua h dan a yang diketahui. Hal ini akan memberikan pengertian bahwa untuk semua titik dalam interval tersebut mempunyai kesempatan sama terpilih atau cenderung mempunyai uniform prior (“improper prior”) Untuk parameter, s, dalam interval (a,ka) akan mempunyai prior probabilitas yang sama, yang hal ini akan memberikan arti bahwa priornya akan proportional pada nilai 1/ s. Lagi-lagi hal ini juga menghasilkan sebuah improper prior. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 36
Macam-macam Prior Conjugate prior vs non-conjugate prior ((Box dan Tiao, 1973),(Gelman et.al, 1995), (Tanner, 1996), (Zellner, 1971)) Prior terkait dengan pola model likelihood datanya Proper prior vs Improper prior (Jeffreys prior) Prior terkait pada pemberian pembobotan/ densitas di setiap titik, uniformly distributed atau tidak Informative prior vs Non-Informative prior Prior terkait dengan sudah diketahui pola/frekuensi distribusi dari datanya atau belum Pseudo-prior (Carlin dan Chib, 1995) Prior terkait dengan pemberian nilainya yang disetarakan dengan hasil elaborasi dari frequentist (misal regresi dengan OLS) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 37
Continuous Parameters Biasanya digunakan uniform prior (at least if the parameter space is of finite extent) Tetapi jika adalah uniform, maka suatu bentuk fungsi non-linear dari , g(), tidak akan uniform Contoh jika p()=1, >0. Re-parameterisasi sebagai maka: dimana sehingga: “ignorance about ” does not imply “ignorance about g.” The notion of prior “ignorance” may be untenable (mungkin dapat diperbolehkan)? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 38
Turning this process around slightly, Bayesian analysis assumes that we can make some kind of probability statement about parameters before we start. The sample is then used to update our prior distribution. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 39
Pertama, anggap bahwa prior yang digunakan dapat direpresentasikan sebagai probability density function p(q) dengan q adalah parameter yang akan dipelajari. Berdasarkan pada sampel X (likelihood function) kita akan dapat meng-update distribusi priornya mengguankan Bayes rule Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 40
Beberapa Conjugate priors Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 41
The Jeffreys Prior (single parameter) Jeffreys prior diberikan sebagai berikut: dimana adalah expected Fisher Information This is invariant to transformation in the sense that all parametrizations lead to the same prior Can also argue that it is uniform for a parametrization where the likelihood is completely determined (see Box and Tiao, 1973, Section 1.3) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 42
Contoh Jeffreys pada Binomial Hasil ini adalah suatu bentuk distribusi beta dengan parameters ½ and ½ Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 43
Contoh Jeffreys’ Priors yang lain Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 44
Improper Priors Trouble Posterior (sometimes) Suppose Y1, .,Yn are independently normally distributed with constant variance s2 and with: Suppose it is known that r is in [0,1], r is uniform on [0,1], and g, b, and s have improper priors Then for any observations y, the marginal posterior density of r is proportional to where h is bounded and has no zeroes in [0,1]. This posterior is an improper distribution on [0,1]! Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 45
Improper prior usually proper posterior Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 46
Contoh lain: improper proper Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 47
Subjective Degrees of Belief Probability represents a subjective degree of belief held by a particular person at a particular time Various techniques for eliciting subjective priors. For example, Good’s device of imaginary results. e.g. binomial experiment. beta prior with a=b. “Imagine” the experiment yields 1 tail and n-1 heads. How large should n be in order that we would just give odds of 2 to 1 in favor of a head occurring next? (eg n = 4 implies a=b=1) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 48
Problems with Subjectivity What if the prior and the likelihood disagree substantially? The subjective prior cannot be “wrong” but may be based on a misconception The model may be substantially wrong Often use hierarchical models in practice: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 49
Hierarchical Model Contoh pada kasus Binomial Beta(a, b) Poisson(λ) Gamma(c, d) Gamma(g, h) Gamma(e, f) Beta(a, b) Poisson(λ) Binomial(n, p) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 50
General Comments Determination of subjective priors is difficult Difficult to assess the usefulness of a subjective posterior Don’t be misled by the term of “subjective”; all data analyses involve appreciable personal elements Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 51
Once again: An example with a continuous variable: A beta-binomial example The setup: We are flipping a biased coin, where the probability of heads p could be anywhere between 0 and 1. We are interested in p. We will have two sources of information: Prior beliefs, which we will express as a beta distribution, and Data, which will come in the form of counts of heads in 10 independent flips. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 52
An example with a continuous variable: A beta-binomial example--the Prior Distribution Let’s suppose we think it is more likely that the coin is close to fair, so p is probably nearer to .5 than it is to either 0 or 1. We don’t have any reason to think it is biased toward either heads or tails, so we’ll want a prior distribution that is symmetric around .5. We’re not real sure about what p might be--say about as sure as only 6 observations. This corresponds to 3 pseudo-counts of H and 3 of T, which, if we want to use a beta distribution to express this belief, corresponds to beta(4,4): Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 53
An example with a continuous variable: A beta-binomial example--the Prior Distribution Beta. Defined on [0,1]. Conjugate prior for the probability parameter in Bernoulli & binomial models. p ~ dbeta(4,4) Mean(p): Variance(p): Mode(p): PseudoCount of successes PseudoCount of failures The variable: “success probability” The failure probability Shape, or “prior sample info” The success probability Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 54
An example with a continuous variable: A beta-binomial example--the Likelihood Next we will flip the coin ten times. Assuming the same true (but unknown to us) value of p is in effect for each of ten independent trials, we can use the binomial distribution to model the probability of getting any number of heads: i.e., Count of observed successes The variable Count of observed failures The “success probability” parameter The failure probability The success probability Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 55
An example with a continuous variable: A beta-binomial example--the Likelihood We flip the coin ten times, and observe 7 heads; i.e., r=7. The likelihood is obtained now using the same form as in the preceding slide, except now r is fixed at 7 and we are interested in the relative value of this function at different possible values of p: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 56
An example with a continuous variable: Obtaining the posterior by Bayes Theorem posterior likelihood prior General form: In our example, 7 plays the role of x*, and p plays the role of y. Before normalizing: After normalizing: Now, how can we get an idea of what this means we believe about p after combining our prior belief and our observations? Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 57
An example with a continuous variable: In pictures Prior x Likelihood Posterior Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 58
An example with a continuous variable: Using the fact that we have conjugate distributions Now This is just the kernel of a beta(11,7) distribution. This is rather special. The data were observed in accordance with a probability function which would have that same mathematical form as a likelihood once data are observed. We chose a prior distribution (in this case, a beta distribution) which would combine with the likelihood just so as to produce another distribution in the same parametric family (another beta distribution), just with updated parameters. We can work out its summary statistics: Mean(p): Variance(p): Mode(p): prior was .5 .028 .5 Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 59
An example with a continuous variable: Using BUGS Now What BUGS does in this simple problem with one variable is to sample lots of values from the posterior distribution for p; that is, its distribution as determined first with information from the prior, but further conditional on the observed data. Here are the summary statistics from 50000 draws: Mean(p): Variance(p): Mode(p): prior was .5 .028 .5 .11162~.0125 Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 60
An example with a continuous variable: Using BUGS BUGS setup for this problem: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 61
Looking ahead to sampling-based approaches with many variables BUGS = Bayesian-inference Using Gibbs Sampling Basic idea: Model multi-parameter problem in terms of assemblies of distributions and functions for all data and all parameters (taking advantage of conditional dependence whenever possible). E.g., p(Data|x,y) p(x|z) p(y) p(z). (*) Observe Data*; Posterior p(x,y,z|Data*) is proportional to (*). Hard to evaluate normalizing constant, but ... Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 62
Looking ahead to sampling-based approaches with many variables Can draw values from “full conditional” distributions: Start with a possible value for each variable in cycle 0. In cycle t+1, Draw xt+1 from p(x|Y= yt,Z= zt,Data*) Draw yt+1 from p(y|X= xt+1,Z= zt,Data*) Draw zt+1 from p(z|X= xt+1,Y= yt+1,Data*) Under suitable conditions, these series of draws will come to approximate draws from the actual true joint posterior for all the parameters. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 63
Inference in a chain Recursive representation: p(u,v,x,y,z) = p(z|y,x,v,u) p(y|x,v,u) p(x|v,u) p(v|u) p(u) = p(z|y) p(y|x) p(x|v) p(v|u) p(u). U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 64
Start here, by revising belief about X Inference in a chain Suppose we learn the value of X: Start here, by revising belief about X U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 65
Inference in a chain Propagate information down the chain using conditional probabilities: From updated belief about X, use conditional probability to revise belief about Y U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 66
Inference in a chain Propagate information down the chain using conditional probabilities: From updated belief about Y, use conditional probability to revise belief about Z U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 67
Inference in a chain Propagate information up the chain using Bayes Theorem: From updated belief about X, use Bayes Theorem to revise belief about V U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 68
Inference in a chain Propagate information up the chain using Bayes Theorem: From updated belief about V, use Bayes Theorem to revise belief about U U V X Y Z p(v|u) p(x|v) p(y|x) p(z|y) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 69
Inference in singly-connected nets Singly connected: There is never more than one path from one variable to another variable. Chains and trees are singly connected. Can use repeated applications of Bayes theorem and conditional probability to propagate evidence. (Pearl, early 1980s) V U X Y Z Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 70
Posterior Summaries Mean, median, mode, percentile, etc. Central 95% interval versus highest posterior density region (normal mixture example…) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 71
Bayesian Confidence Intervals Apart from providing an alternative procedure for estimation, the Bayesian approach provides a direct procedure for the formulation of parameter confidence intervals. Returning to the simple case of a single coin toss, the probability density function of the estimator becomes: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 72
As previously discussed, try to give a=b=1 As previously discussed, try to give a=b=1.4968, the Bayesian estimator of P is .6252. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 73
Please verify this result! However, using the posterior distribution function, we can also compute the probability that the value of p is less than .5 given a head: Please verify this result! Hence, we have a very formal statement of confidence intervals as P(0.3 < p < 0.7). Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 74
Prediction “Posterior Predictive Density” of a future observation binomial example, n=20, x=12, a=1, b=1 ~ y y Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 75
Prediction for Univariate Normal Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 76
Prediction for Univariate Normal Posterior Predictive Distribution is Normal Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 77
Prediction for a Poisson Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 78
On the Compromise of Bayesian to Classical Estimation (presented on South-East Asia Stat & Math Muslim Society Conference) Nur Iriawan Statistics Department of Institut Teknologi Sepuluh Nopember Jl. Arief Rahman Hakim Sukolilo, Surabaya 60111, Indonesia iriawann@sby.centrin.net.id Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 79
Example on Exponential Suppose x is exponentially distributed The MLE of is Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 80
Using Bayesian approach with prior of is The likelihood would be Then the posterior of given the data X is Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 81
The Bayes estimator for can be derived using Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 82
Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 83
Numerical Calculation One thousand generated data from Exponential distribution, then The classical MLE give the result (using MINITAB) as follows Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 84
Using WinBUGS, the Bayes estimator is Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 85
Lihat kembali hasil dari Binomial Estimator Bayes diperoleh Cara klasik memberikan hasil bahwa Bagaimana jika α = β = 0? Estimator Bayes akan menjadi sama dengan cara klasik. Demikian halnya jika nilai-nilai ini diterapkan pada prior beta, maka prior tersebut akan berubah menjadi sebuah Jeffreys’ prior. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 86
Summary The Bayesian estimator reported as the posterior mean which is used here is generated from an improper prior distribution. It has been shown that when there is no information about the prior of the parameter of model, a constant or Jeffreys’ prior is used, the resulting estimator will give a compromise result between Bayesian and Classical estimator. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 87
Numerical Integration: Monte Carlo Method (Low dan Kelton, 2000) Anggap kita akan menghitung integral berikut Jika g(x) cukup kompleks maka nilai I akan cukup rumit. Dengan cara numerik seperti beriktu dapat diperoleh nilai I dengan cukup sederhana. Caranya adalah sbb: Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 88
Hitung ekspektasi Y dengan cara berikut Buat random variabel baru dengan x bernilai uniform dalam interval (a,b), atau U(a,b). Hitung ekspektasi Y dengan cara berikut Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 89
Sehingga nilai integral I dapat didekati secara numerik oleh Diketahui bahwa Sehingga nilai integral I dapat didekati secara numerik oleh Berarti, bangkitkan data yang mempunyai distribusi Uniform dan masukkan nilainya ke fungsi g(x) jumlahkan nilainya dan hitung rata-ratanya sebagai taksiran nilai integral yang sedang dicari. Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 90
Berapa banyak data yang harus dibangkitkan? Data harus dibangkitkan sebanyak mungkin sampai nilai rata-ratanya mencapai titik konvergen. Burn-in Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 91
Cara lain menghitung nilai estimasi integral dengan RNG Macam Random Number Generator (RNG) Transformasi Invers Composisition Convolution Acceptance Rejection (AR) Adaptive Acceptence Rejection (AAR) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 92
Transformasi Invers Syarat Transformasi Invers Metodenya adalah sbb: Fungsi mempunyai CDF yang close form Metodenya adalah sbb: x u 1 F(x) Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 93
Composition (Mixture form) Perhatikan bentuk fungsi berikut Half Normal Exponential I II f(x) Dimana data di daerah I dibangkitkan dengan Normal dan di daerah II dengan Exponential Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 94
Convolution Misalkan sebuah fungsi Erlang(m ), maka cara pembangkitan datanya adalah dengan mengkonvolusikan data bangkitan Exponential( ). Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 95
Acceptance Rejection (AR) Sangat bagus untuk fungsi yang tidak jelas pdf atau bukan Dapat mengakomodasikan fungsi yang tidak mempunyai CDF close form Caranya adalah sbb: tx f(x) Reject Accept rx Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 96
Algoritma AR Bangkitkan x ~ rx Bangkitkan u ~ U(0,1) If then Accept x Else Reject x Nur Iriawan Bayesian Modeling, PENS – ITS - 2006 97