Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom Materi 5 Problem Detection.

Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom kusuma.wahyu.a@gmail.com081233148591 Materi 5 Problem Detection

Topik Metrics Object-Oriented Metrics dalam Praktek Duplikasi kode

Topik Metrics – Kualitas dari Perangkat Lunak – Menganalisa Kecenderungan Object-Oriented Metrics dalam Praktek Duplikasi kode

7.4 Mengapa menggunakan OO dalam Reengineering? Menaksir kualitas dari perangkat lunak – Komponen mana yang memiliki kualitas yang buruk? (sehingga dapat di reengineering) – Komponen yang mana memiliki kualitas yang baik? (sehingga dapat di reverse engineered)  Metrics sebagai peralatan untuk reengineering Mengontrol proses dari reengineering – Menganalisa kecenderungan : Komponen mana yang bisa diubah?? – Bagian refactoring mana yang dapat digunakan?  Metrics sebagai peralatan reverse engineering!

7.5 ISO 9126 Quantitative Quality Model Software Quality Functionality Reliability Efficiency Usability Maintainability Portability ISO 9126FactorCharacteristic Metric Error tolerance Accuracy Simplicity Modularity Consistency defect density = #defects / size correction impact = #components changed correction time

7.6 Product & Process Attributes Product Attribute Definisi: Mengukur aspek dari Hasil yang dikirimkan ke pelanggan Contoh: Jumlah dari sistem Yang rusak, mempelajari tentang sistem Process Attribute Definisi: Mengukur aspek dari Proses dimana memproduksi produk Contoh: waktu untuk memperbaiki, kerusakan jumlah dari komponen Yang dirubah per perbaikan

7.7 External & Internal Attributes External Attribute Definisi: mengukur bagaimana product/process berjalan dalam environment Contoh: waktu rata-rata dalam kesalahan, #components changed Internal Attribute Definisi: mengukur didalam Istilah didalam produk Memisahkan FORM, dalam konteks behaviour Contoh: class coupling dan cohesion, method size

7.8 External vs. Internal Product Attributes ExternalInternal Keuntungan:  close relationship dengan quality factors Kerugian:  relationship dengan quality factors tidak dalam empirically validated Kerugian:  Mengukur hanya setelah produk digunakan  Pengumpulan data sulit data serinkali ada interfrensi pengguna  Menghubungkan eksternal efek ke dalam internal sangat sulit Keuntungan:  Dapat diukur kapanpun  Pengumpulan data dapat secara mudah dan otomatis  Berhubungan langsung dengan pengukuran dan penyebabnya

7.9 Metrik dan Pengukuran Weyuker [1988] mendefinisikan sembilan properti dimana Metrik software harus diambil Untuk OO hanya 6 properti yang sangat penting [Chidamber 94, Fenton & Pfleeger ] – Non coarseness: Diberikan sebuah Class P dan sebuak metrik m, kelas lain misal Q juga dapat ditemukan sehingga menjadi m(P)  m(Q) Tidak semua kelas memiliki nilai yang sama untuk metrik – Non uniqueness. Dimana kelas P dan Q memiliki ukuran tetap sedemikian sehingga m(P) = m(Q) Dua kelas dapat memiliki metrik yang sama – Monotonicity m(P)  m (P+Q) dan m(Q)  m (P+Q), P+Q adalah “kombinasi” dari kelas P dan Q.

7.10 Metrik dan Pengukuran – Design Details are Important Inti utama dari Class harus mempengaruhi nilai dari metrik. Setiap class melakukan aksi yang sama dengan detailnya harus memberikan dampak terhadap nilai dari metrik. – Nonequivalence of Interaction m(P) = m(Q)  m(P+R) = m(Q+R) dimana R interaksi dengan Class – Interaction Increases Complexity m(P) + (Q) < m (P+Q). Dimana dua class digabungkan, interaksi diantaranya juga akan menambah nilai dari metrik Kesimpulan: Tidak semua pengukuran berupa Metrik

7.11 Memilih Metrik Cepat – Scalable: Kita tidak dapat menghasilkan log(n2) dimana n  1 juta LOC (Line of Code) Tepat – (misalnya #methods — perhitungkan semua method, public, juga inherited?) Bergantung pada kode – Scalable: Kita menginginkan mengumpulkan metrik dalam waktu sama Sederhana – Metrik yang komplek sulit untuk diterjemahkan

7.12 Menaksir kemudahan perbaikan Ukuran dari sistem, termasuk entitas dari sistem – Ukuran Class, Ukuran method, inheritance – Ukuran entitas mempengaruhi maintainability Kesatuan dari entities – Class internal – Perubahan harusnya ada dikelas tersebut Coupling (penggabungan) diantara entitas – Didalam inheritance: coupling diantara class-subclass – Diluar inheritance – Strong coupling mempengarui perubahan di kelas tersebut

7.13 Sample Size and Inheritance Metrics Class Attribute Method Access Invoke BelongTo Inherit Inheritance Metrics hierarchy nesting level (HNL) # immediate children (NOC) # inherited methods, unmodified (NMI) # overridden methods (NMO) Class Size Metrics # methods (NOM) # instance attributes (NIA, NCA) # Sum of method size (WMC) Method Size Metrics # invocations (NOI) # statements (NOS) # lines of code (LOC)

7.14 Sample class Size (NIV) – [Lore94] Number of Instance Variables (NIV) – [Lore94] Number of Class Variables (static) (NCV) – [Lore94] Number of Methods (public, private, protected) (NOM) (LOC) Lines of Code (NSC) Number of semicolons [Li93]  number of Statements (WMC) [Chid94] Weighted Method Count – WMC = ∑ c i – where c is the complexity of a method (number of exit or McCabe Cyclomatic Complexity Metric)

7.15 Hierarchy Layout (HNL) [Chid94] Hierarchy Nesting Level, (DIT) [Li93] Depth of Inheritance Tree, HNL, DIT = max hierarchy level (NOC) [Chid94] Number of Children (WNOC) Total number of Children (NMO, NMA, NMI, NME) [Lore94] Number of Method Overridden, Added, Inherited, Extended (super call) (SIX) [Lore94] – SIX (C) = NMO * HNL / NOM – Weighted percentage of Overridden Methods

7.16 Method Size (MSG) Number of Message Sends (LOC) Lines of Code (MCX) Method complexity – Total Number of Complexity / Total number of methods – API calls= 5, Assignment = 0.5, arithmetics op = 2, messages with params = 3....

7.17 Sample Metrics: Class Cohesion (LCOM) Lack of Cohesion in Methods – [Chidamber 94] for definition – [Hitz 95] for critique Ii = set of instance variables used by method Mi let P = { (Ii, Ij ) | Ii  Ij =  } Q = { (Ii, Ij ) | Ii  Ij   } if all the sets are empty, P is empty LCOM =|P| - |Q|if |P|>|Q| 0otherwise Tight Class Cohesion (TCC) Loose Class Cohesion (LCC) – [Bieman 95] for definition – Measure method cohesion across invocations

7.18 Sample Metrics: Class Coupling (i) Coupling Between Objects (CBO) – [Chidamber 94a] for definition, – [Hitz 95a] for a discussion – Number of other classes to which it is coupled Data Abstraction Coupling (DAC) – [Li 93] for definition – Number of ADT’s defined in a class Change Dependency Between Classes (CDBC) – [Hitz 96a] for definition – Impact of changes from a server class (SC) to a client class (CC).

7.19 Sample Metrics: Class Coupling (ii) Locality of Data (LD) – [Hitz 96] for definition LD = ∑ |Li | / ∑ |Ti | Li = non public instance variables + inherited protected of superclass + static variables of the class Ti = all variables used in Mi, except non-static local variables Mi = methods without accessors

7.20 The Trouble with Coupling and Cohesion Coupling and Cohesion are intuitive notions – Cf. “computability” – E.g., is a library of mathematical functions “cohesive” – E.g., is a package of classes that subclass framework classes cohesive? Is it strongly coupled to the framework package?

7.21 Conclusion: Metrics for Quality Assessment Can internal product metrics reveal which components have good/poor quality? Yes, but... – Not reliable false positives: “bad” measurements, yet good quality false negatives: “good” measurements, yet poor quality – Heavyweight Approach Requires team to develop (customize?) a quantitative quality model Requires definition of thresholds (trial and error) – Difficult to interpret Requires complex combinations of simple metrics However... – Cheap once you have the quality model and the thresholds – Good focus (± 20% of components are selected for further inspection) Note: focus on the most complex components first!

Topik Metrics Object-Oriented Metrics dalam Praktek – Detection strategies, filters and composition – Sample detection strategies: God Class … Duplikasi kode

7.23 Detection strategy A detection strategy is a metrics-based predicate to identify candidate software artifacts that conform to (or violate) a particular design rule

7.24 Filters and composition A data filter is a predicate used to focus attention on a subset of interest of a larger data set – Statistical filters I.e., top and bottom 25% are considered outliers – Other relative thresholds I.e., other percentages to identify outliers (e.g., top 10%) – Absolute thresholds I.e., fixed criteria, independent of the data set A useful detection strategy can often be expressed as a composition of data filters

7.25 God Class A God Class centralizes intelligence in the system – Impacts understandibility – Increases system fragility

7.26 Feature Envy Methods that are more interested in data of other classes than their own [Fowler et al. 99]

7.27 Data Class A Data Class provides data to other classes but little or no functionality of its own

7.28 Data Class (2)

7.29 Shotgun Surgery A change in an operation implies many (small) changes to a lot of different operations and classes

Topik Metrics Object-Oriented Metrics dalam Praktek Duplikasi kode – Detection techniques – Visualizing duplicated code

7.31 Kode di salin Contoh dari Mozilla Distribution (Milestone 9) Diambil dari /dom/src/base/nsLocation.cpp

7.32 ContohLOC Duplikasi tanpa komentar Dengan komentar gcc460’0008.7%5.6% Database Server245’00036.4%23.3% Payroll40’00059.3%25.4% Message Board6’50029.4%17.4% Berapa banyak kode diduplikasi? Biasanya diperkirakan: 8 hingga 12% dari kode

7.33 Apa itu duplikasi kode? Duplikasi kode = Bagian dari kode program ditemukan ditempat lain dalam satu sistem yang sama – Dalam File yang berbeda – Dalam File sama tapi Method berbeda – Dalam Method yang sama Bagian tersebut harus memiliki logika atau struktur yang sama sehingga dapat diringkas,

7.34 Permasalahan dari duplikasi Biasanya memberikan efek negatif – Penggelembungan kode Efek negatif ketika perbaikan sistem atau software Menyalin menjadi kerusakan tambahan dalam kode – Software Aging, “hardening of the arteries”, – “Software Entropy” increases even small design changes become very difficult to effect

7.35 Nontrivial problem: No a priori knowledge about which code has been copied How to find all clone pairs among all possible pairs of segments? Mendeteksi duplikasi kode

7.36 AuthorLevelTransformed CodeComparison Technique Johnson 94LexicalSubstringsString-Matching Ducasse 99LexicalNormalized StringsString-Matching Baker 95SyntacticalParameterized StringsString-Matching Mayrand 96SyntacticalMetric TuplesDiscrete comparison Kontogiannis 97SyntacticalMetric TuplesEuclidean distance Baxter 98SyntacticalASTTree-Matching General Schema of Detection Process

7.37 Recall and Precision

7.38 … //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ]; … //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ]; … fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+] … fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+] Simple Detection Approach (i) Assumption: Code segments are just copied and changed at a few places Noise elimination transformation remove white space, comments remove lines that contain uninteresting code elements – (e.g., just ‘else’ or ‘}’)

7.39 Simple Detection Approach (ii) Code Comparison Step – Line based comparison (Assumption: Layout did not change during copying) – Compare each line with each other line. – Reduce search space by hashing: Preprocessing: Compute the hash value for each line Actual Comparison: Compare all lines in the same hash bucket Evaluation of the Approach – Advantages: Simple, language independent – Disadvantages: Difficult interpretation

7.40 A Perl script for C++ (i)

7.41 A Perl script for C++ (ii) Handles multiple files Removes comments and white spaces Controls noise (if, {,) Granularity (number of lines) Possible to remove keywords

7.42 Output Sample Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: 6178/6179/6180/6181/6182 6198/6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: 6177/6178 6229/6230 Lines = duplicated lines Locations = file names and line number

7.43 Enhanced Simple Detection Approach Code Comparison Step – As before, but now Collect consecutive matching lines into match sequences Allow holes in the match sequence Evaluation of the Approach – Advantages Identifies more real duplication, language independent – Disadvantages Less simple Misses copies with (small) changes on every line

7.44 Abstraction – Abstracting selected syntactic elements can increase recall, at the possible cost of precision

7.45 Metrics-based detection strategy Duplication is significant if: – It is the largest possible duplication chain uniting all exact clones that are close enough to each other. – The duplication is large enough.

7.46 Automated detection in practice Wettel [ MSc thesis, 2004] uses three thresholds: – Minimum clone length: the minimum amount of lines present in a clone (e.g., 7) – Maximum line bias: the maximum amount of lines in between two exact chunks (e.g., 2) – Minimum chunk size: the minimum amount of lines of an exact chunk (e.g., 3) Mihai Balint, Tudor Gîrba and Radu Marinescu, “How Developers Copy,” ICPC 2006

7.47 Visualization of Duplicated Code Visualization provides insights into the duplication situation – A simple version can be implemented in three days – Scalability issue Dotplots — Technique from DNA Analysis – Code is put on vertical as well as horizontal axis – A match between two elements is a dot in the matrix

7.48 Detected Problem File A contains two copies of a piece of code File B contains another copy of this code Possible Solution Extract Method All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System) Visualization of Copied Code Sequences

7.49 Detected Problem 4 Object factory clones: a switch statement over a type variable is used to call individual construction code Possible Solution Strategy Method Visualization of Repetitive Structures

7.50 Visualization of Cloned Classes Class A Class B Class A Detected Problem: Class A is an edited copy of class B. Editing & Insertion Possible Solution Subclassing …

7.51 20 Classes implementing lists for different data types Detail Overview Visualization of Clone Families

7.52 Kesimpulan Duplikasi Kode adalah masalah nyata – Membuat sistem semakin susah untuk diubah Mendeteksi duplikasi kode adalah masalah berat – Beberapa teknik sederhana dapat membantu – Dukungan dari alat lain juga dibutuhkan Visualisasi dari kode sangat berguna Mengatasi duplikasi kode bisa dijadikan bahan penelitian

Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom Materi 5 Problem Detection.

Presentasi serupa

Presentasi berjudul: "Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom Materi 5 Problem Detection."— Transcript presentasi:

Presentasi serupa

Tentang proyek

Tanggapan

Masuk

Otorisasi melalui jaringan sosial:

Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom Materi 5 Problem Detection.

Presentasi serupa

Presentasi berjudul: "Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom Materi 5 Problem Detection."— Transcript presentasi:

Presentasi serupa

Tentang proyek

Tanggapan