Presentasi sedang didownload. Silahkan tunggu

Presentasi sedang didownload. Silahkan tunggu

SHINTA P..  Apa hal pertama yang Anda baca dalam sebuah novel? Memberikan ringkasan halaman web diambil terkait dengan permintaan pengguna.  Diperlukan.

Presentasi serupa


Presentasi berjudul: "SHINTA P..  Apa hal pertama yang Anda baca dalam sebuah novel? Memberikan ringkasan halaman web diambil terkait dengan permintaan pengguna.  Diperlukan."— Transcript presentasi:

1 SHINTA P.

2  Apa hal pertama yang Anda baca dalam sebuah novel? Memberikan ringkasan halaman web diambil terkait dengan permintaan pengguna.  Diperlukan mesin peringkas otomatis. Ringkasan yang dihasilkan manusia yang mahal.

3 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA3

4 4

5 5

6 6

7

8 TRIBUNNEWS.COM, BANDA ACEH - Sembilan personel girlband Cherrybelle konser di Hotel Hermes Palace, Banda Aceh, Selasa (30/4/2013) malam.Cherrybelle Dalam konser tadi malam mereka membawakan lagu dari album pertama maupun album terbarunya, yaitu Diam-diam Suka. Saat tampil membawakan lagu pertama Best Friend Forever, para fans Cherrybelle menyambutnya dengan histeris.Cherrybelle Ketika tampil di atas panggung, sembilan perempuan imut ini terlihat memakai busana yang berbeda. Tidak seperti tampil di daerah lain, baju minimalis mereka pun ditanggalkan. Sebagai gantinya blus lengan panjang dengan warna-warna kalem membalut tubuh mereka. Bagaimana dengan tatanan kepala? Cherrybelle tampil polos tanpa aksesoris apa pun menempel di rambutnya. Hanya terlihat gaya ikat ekor kuda dan sebagian jepit. Tak ada pula kerudung atau hijab di kepala mereka, sebagaimana layaknya penampilan perempuan Aceh lainnya.Cherrybelle Barulah saat di sela-sela persiapan menjelang tampil tadi malam, personel Cherrybelle menyisihkan waktu untuk melayani pertanyaan Serambinews.com (Tribunnews.com Network). Saat sesi wawancara dan foto barulah mereka mengenakan kerudung putih/Cherrybelle Dalam wawancara singkat mereka katakan bahwa Banda Aceh merupakan kota pertama yang mereka kunjungi dalam roadshow ini. Mereka langsung tertarik pada Kota Banda Aceh sejak pertama kali tiba di Bandara Sultan Iskandar Muda yang berlokasi di Blangbintang, Aceh Besar. Konser di Aceh ini merupakan rangkaian dari agenda roadshow Cherrybelle Beat Indonesia di 33 provinsi selama 31 hari.Cherrybelle "Kami tuh sudah tertarik sama Banda Aceh sejak menginjakkan kaki di bandara. Bandaranya bagus banget, beda kali dengan bandara di kota lain. Di sini atapnya berbentuk kubah masjid, indah banget," puji mereka kompak saat wawancara eksklusif dengan Serambinews.com.

9  Indicative vs Informative ◦ Indicative - indicates types of information (“alerts”)  “The work of Consumer Advice Centres is examined…” ◦ Informative  “The work of Consumer Advice Centres was found to be a waste of resources due to low availabily…” ◦ Critic / Evaluative  Evaluates the content of the document Purpose

10  Abstrak (IR)  Ekstrak( IE)

11  Single Doc  Multi Doc

12  Query Independent  Query Spesifik

13  Shallow Approach ◦ Hanya pada permukaan dokumen ◦ Hasil berupa Sentence Extraction ◦ Bisa out of Context  Deep Method ◦ Hasil berupa Abstrak

14 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA14 Top-Down:  I know what I want! — don’t confuse me with drivel!  Pengguna hanya ingin jenis info tertentu.  Sistem membutuhkan kriteria tertentu yang menarik, digunakan untuk memfokuskan pencarian. Bottom-Up: I’m dead curious: what’s in the text? Pengguna ingin mendapatkan semua info penting. System butuh data2 yang penting untuk pencarian.

15 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA15  IE task: Given a form and a text, find all the information relevant to each slot of the form and fill it in.  Summ-IE task: Given a query, select the best form, fill it in, and generate the contents. Questions: 1. IE works only for very particular forms; can it scale up? 2. What about info that doesn’t fit into any form—is this a generic limitation of IE? xx xxx xxxx x xx xxxx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxxxx xxxxx xx xxx x xxxxx xxx Xxxxx: xxxx Xxx: xxxx Xxx: xx xxx Xx: xxxxx x Xxx: xx xxx Xx: x xxx xx Xx: xxx x Xxxx: xx Xxx: x

16 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA16  IR task: Given a query, find the relevant document(s) from a large set of documents.  Summ-IR task: Given a query, find the relevant passage(s) from a set of passages (i.e., from one or more documents). Questions: 1. IR techniques work on large volumes of data; can they scale down accurately enough? 2. IR works on words; do abstracts require abstract representations? xx xxx xxxx x xx xxxx xx xxx x xxx xx xxx x xx x xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xxxxx xxxxx xx xxx x xxxxx xxx

17 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA17 IE: Approach: try to ‘understand’ text— transform content into ‘deeper’ notation; then manipulate that. Need: rules for text analysis and manipulation, at all levels. Strengths: higher quality; supports abstracting. Weaknesses: speed; still needs to scale up to robust open-domain summarization. IR: Approach: operate at word level—use word frequency, collocation counts, etc. Need: large amounts of text. Strengths: robust; good for query-oriented summaries. Weaknesses: lower quality; inability to manipulate information at abstract levels.

18 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA18 Combine strengths of both paradigms…...use IE/NLP when you have suitable form(s),...use IR when you don’t… …but how exactly to do it?

19 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA19 EXTRACTS ABSTRACTS ? MULTIDOCS ExtractAbstract Indicative Generic Background Query-oriented Just the news 10% 50% 100 % Very Brief Brief Long Headline Informative DOC QUERY CASE FRAMES TEMPLATES CORE CONCEPTS CORE EVENTS RELATIONSHIPS CLAUSE FRAGMENTS INDEX TERMS

20 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA20 EXTRACTIONEXTRACTION INTERPRETATIONINTERPRETATION EXTRACTS ABSTRACTS ? CASE FRAMES TEMPLATES CORE CONCEPTS CORE EVENTS RELATIONSHIPS CLAUSE FRAGMENTS INDEX TERMS MULTIDOC EXTRACTS GENERATIONGENERATION FILTERINGFILTERING DOC EXTRACTS

21 1.Pengukuran - Compression Rate = panjang ringkasan/ panjang doc asil 2. Keinformatifan - Kepercayan pada summber,bias atau tidak khususnya bagi yang bersifat evaluatif 3. Bentuk yang baik Perbaikan terhadap dagling, kalimat tidak nyambung, dan Anaphora (ketidak jelasan reference)

22 Koleksi Docs Kalimat terpilih Pilih Kalimat : Metode Zift, Tf Idf, etc Ekstrak: Urutkan kalimat sesuai lokasi, lakukan smoothing, Ubah jadi kalimat yang baik Ringkasan Koheren : dapat dibaca dan dimengerti maksudnya

23 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA23 1. Topic Identification: find/extract the most important material 2. Topic Interpretation: compress it 3. Summary Generation: say it in your own words …as easy as that!

24 SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA24  Language: ◦ Syntax = grammar, sentence structure sleep colorless furiously ideas green — no syntax ◦ Semantics = meaning colorless green ideas sleep furiously — no semantics  Evaluation: ◦ Recall = how many of the things you should have found/did, did you actually find/do? ◦ Precision = of those you actually found/did, how many were correct?

25  Intriksi, ◦ Menguji sendiri dengan kriteria tertentu:  Koherens, mudah dibaca dn dimengerti  Informatifness, dapat memberikan informasi tentang doc asli  Ekstrinsi ◦ Menguji sistem dalam hubungannya dengan tugas lain dengan meminta orang lain untuk mengevaluasi.

26 A. Bellaachia26  The main steps of SUMMARIZER 1 are: 1.For each sentence i  S, compute the relevance measure between S i and D: Inner Product, or Cosine Similarity, or Jaccard coefficient. 2.Select sentence S k that has the highest relevance score and add it to the summary. 3.Delete S k from S, and eliminate all the terms contained in S k from the document vector and S vectors. Re-compute the weighted term- frequency vectors (D and all S i ). 4.If the number of sentences in the summary reaches the predefined value, terminate the operation: otherwise go to step 1.

27 A. Bellaachia27  This summarizer is the simplest among all the proposed techniques.  It uses the TF*IDF weighting schema to select sentences.  It works as follows: 1.Create the weighted term-frequency vector S i for each sentence i  S using TF*IDF (Term frequency * Inverse Document Frequency). 2.Sum up the TF*IDF score for each sentence and rank them. 3.Select the predefined number of sentences in the summary from S.

28 A. Bellaachia28  This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.  K-means: ◦ Start with random position of K centroids. ◦ Iteratre until centroids are stable ◦ Assign points to centroids ◦ Move centroids to center of assign points Iteration = 0

29 A. Bellaachia29  This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.  K-means: ◦ Start with random position of K centroids. ◦ Iteratre until centroids are stable ◦ Assign points to centroids ◦ Move centroids to center of assign points Iteration = 1

30 A. Bellaachia30  This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.  K-means: ◦ Start with random position of K centroids. ◦ Iteratre until centroids are stable ◦ Assign points to centroids ◦ Move centroids to center of assign points Iteration = 2

31 A. Bellaachia31  This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.  K-means: ◦ Start with random position of K centroids. ◦ Iteratre until centroids are stable ◦ Assign points to centroids ◦ Move centroids to center of assign points Iteration = 3

32 A. Bellaachia32  This summarizer works as follows: 1.Create the weighted term-frequency vector A i for each sentence S i using TF*IDF. 2.Form a sentences-by-terms matrix and feed it to the K-means clustering algorithm to generate k clusters. 3.Sum up the TF*IDF score for each sentence in each cluster. 4.Pick the sentence with the highest TF*IDF score from within each cluster and add it to the summary.


Download ppt "SHINTA P..  Apa hal pertama yang Anda baca dalam sebuah novel? Memberikan ringkasan halaman web diambil terkait dengan permintaan pengguna.  Diperlukan."

Presentasi serupa


Iklan oleh Google