Bab 3 – Optimisasi Query Mata Kuliah Basis Data Lanjut Universitas Al Azhar Indonesia Endang Ripmiatin Apr 2016
T o p i k 2 Pendahuluan Terjemahan SQL Queries ke aljabar relasional Penggunaan Heuristics dalam optimisasi Query
Pendahuluan Optimisasi Query: proses pemilihan strategi eksekusi yang paling tepat atas query yang sedang diproses. Representasi internal query Query Tree 3
1.Parsing and translating – pada high level query mengembangkan representasi internal 2.Optimisasi – representasi dioptimisasi dengan aturan heuristic 3.Evaluasi – query-execution plan dipilih untuk melaksanakan operasi berdasarkan access path yang tersedia dalam tabel dan terlibat dalam query Query in high level language scanner,pars er, translator query output statistik data Query Optimize r execution plan data Query-exec Engine data extended relational algebra expression
Terjemahan SQL Queries ke aljabar relasional Query block: suatu unit dasar yang akan diterjemahkan ke operator aljabar dan dioptimisasi. Terdiri dari sebuah ekspresi SELECT-FROM-WHERE, termasuk klausal GROUP BY dan HAVING bila ada. Nested queries diidentifikasi sebagai query blocks yang terpisah. Operator agregat (MIN, MAX, AVERAGE) dimasukkan dalam extended algebra. 5
Terjemahan Nested Queries ke aljabar relasional SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > (SELECTMAX (SALARY) FROMEMPLOYEE WHERE DNO = 5); SELECTMAX (SALARY) FROMEMPLOYEE WHERE DNO = 5 SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > C π LNAME, FNAME ( σ SALARY>C (EMPLOYEE)) ℱ MAX SALARY ( σ DNO=5 (EMPLOYEE)) 6
Penggunaan Heuristics dalam optimisasi Query 7
Proses untuk optimisasi heuristics Parser menterjemahkan high-level query ke representasi internal awal; Terapkan heuristics rules untuk optimisasi representasi internal; Kembangkan query execution plan untuk mengeksekusi kelompok operasi berdasarkan access paths yang tersedia pada tabel yang terlibat dalam query. Heuristic mengutamakan operasi yang mengurangi ‘ukuran’ dari ‘hasil antara’. Misalnya: terapkan operasi SELECT dan PROJECT sebelum menerapkan operasi JOIN atau operasi biner lain. 8
Penggunaan Heuristics dalam optimisasi Query Query tree: struktur data pohon yang berkorespondensi ke ekspresi aljabar relasional. leaf nodes input tabel dalam query internal tree nodes operasi aljabar relasional Eksekusi query tree terdiri dari 1.Eksekusi operasi internal node bila tersedia operand 2.Mengganti internal node dengan tabel yang dihasilkan dari eksekusi operasi #1. 9
Penggunaan Heuristics dalam optimisasi Query Contoh Q1: temukan nomor proyek, kode departemen, serta nama keluarga, alamat dan tanggal lahir manager dari departemen yang mengendalikan proyek ybs, untuk semua proyek bertempat di ‘Stafford’ 10
SELECTP.PNUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE FROMPROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E WHEREP.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.LOCATION = ‘Stafford’; Relational algebra: PNUMBER,DNUM,LNAME,ADDRESS,BDATE ((( PLOCATION=‘Stafford’ (PROJECT)) DNUM=DNUMBER (DEPARTMENT ) ) MGRSSN=SSN (EMPLOYEE))
Notasi untuk query tree dan query graph Query tree – berhubungan dengan relational algebra untuk Q1 12 P P.PLOCATION=‘Stafford’ P.PNUMBER,P.DNUM,E.LNAME,E.ADDRESS,E.BDATE D P.DNUM=D.DNUMBER E D.MGRSSN=E.SSN Leaf node Internal tree node (1) (2) (3) PNUMBER,DNUM,LNAME,ADDRESS,BDATE ((( PLOCATION=‘Stafford’ (PROJECT)) DNUM=DNUMBER (DEPARTMENT )) MGRSSN=SSN (EMPLOYEE) ) Angka (1) (2) (3) menunjukkan urutan kerja yang akan dilakukan dalam query execution
Notasi untuk query tree dan query graph Query graph – representasi yang lebih alami dibanding query tree 13 [E.LNAME,E.ADDRESS,E.BDATE] relatio n node constant node Tidak ada urutan kerja yang tertentu untuk query execution
Notasi untuk query tree dan query graph Query tree – initial (canonical) query tree untuk SQL query 14 P P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.PLOCATION=‘Stafford’ P.PNUMBER,P.DNUM,E.LNAME,E.ADDRESS,E.BDATE D E X X
Penggunaan Heuristics dalam optimisasi Query 15
Optimisasi heuristic pada query tree Contoh Q2 – temukan nama keluarga dari karyawan yang lahir setelah 1957 dan bekerja dalam proyek ‘Aquarius’ SELECTLNAME FROMEMPLOYEE, WORKS_ON, PROJECT WHEREPNAME=‘Aquarius’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE > ‘ ’; 16
17 Optimisasi heuristic pada query tree (a) Initial (canonical) query tree untuk SQL query Q Menghasilkan file yang sangat besar berisi cartesian product dari seluruh file EMPLOYEE, WORKS_ON dan PROJECT
18 Optimisasi heuristic pada query tree (b) Pindahkan operasi SELECT ke awal query tree Query lebih efisien karena tanggal lahir EMPLOYEE sudah di- filter
(c) Terapkan operasi SELECT secara lebih restrictive Optimisasi heuristic pada query tree 19
Optimisasi heuristic pada query tree (d) Ganti cartesian product dan SELECT dengan operasi JOIN 20
Optimisasi heuristic pada query tree (e) Perbaikan lain adalah dengan hanya menggunakan atribut yang dibutuhkan pada operasi berikutnya, dengan menggunakan operasi PROJECT ( ) sedini mungkin 21
Using Heuristics in Query Optimization (1) General Transformation Rules for Relational Algebra Operations: 1.Cascade of : A conjunctive selection condition can be broken up into a cascade (sequence) of individual operations: c1 AND c2 AND... AND cn (R) = c1 ( c2 (...( cn (R))...) ) 2.Commutativity of : The s operation is commutative: c1 ( c2 (R)) = c2 ( c1 (R)) 3.Cascade of : In a cascade (sequence) of operations, all but the last one can be ignored: List1 ( List2 (...( Listn (R))...) ) = List1 (R) 4.Commuting with : If the selection condition c involves only the attributes A1,..., An in the projection list, the two operations can be commuted: A1, A2,..., An ( c (R)) = c ( A1, A2,..., An (R)) 22
Using Heuristics in Query Optimization (2) 5.Commutativity of ( and x) : The operation is commutative as is the x operation: R C S S C R; R x S S x R 6.Commuting with (or x) : If all the attributes in the selection condition c involve only the attributes of one of the relations being joined—say, R—the two operations can be commuted as follows: c ( R S ) ( c (R)) S Alternatively, if the selection condition c can be written as (c1 and c2), where condition c1 involves only the attributes of R and condition c2 involves only the attributes of S, the operations commute as follows: c ( R S ) ( c1 (R)) ( c2 (S)) 23
Using Heuristics in Query Optimization (3) 7.Commuting with (or x ): Suppose that the projection list is L = {A 1,..., A n, B 1,..., B m }, where A 1,..., A n are attributes of R and B 1,..., B m are attributes of S. If the join condition c involves only attributes in L, the two operations can be commuted as follows: L ( R C S ) ( A1,..., An (R)) C ( B1,..., Bm (S)) If the join condition c contains additional attributes not in L, these must be added to the projection list, and a final operation is needed. 24
Using Heuristics in Query Optimization (4) 8.Commutativity of set operations: The set operations and are commutative but – is not. 9.Associativity of, x, , and : These four operations are individually associative; that is, if stands for any one of these four operations (throughout the expression), we have ( R S ) T = R ( S T ) 10.Commuting with set operations: The operation commutes with , and –. If stands for any one of these three operations (throughout the expression), we have c ( R S ) = ( c (R)) ( c (S)) 25
Using Heuristics in Query Optimization (5) 11.The operation commutes with . L ( R S ) = ( L (R)) ( L (S)) 12.Converting a ( , x) sequence into : If the condition c of a that follows a x corresponds to a join condition, convert the ( , x) sequence into a as follows: ( c (R x S)) = (R C S) 26
Using Heuristics in Query Optimization (6) Outline of a Heuristic Algebraic Optimization Algorithm: Using rule 1, break up any select operations with conjunctive conditions into a cascade of select ops. Using rules 2, 4, 6, and 10 concerning the commutativity of select with other operations, move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. Using rule 9 concerning associativity of binary operations, rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree rep. Using Rule 12, combine a cartesian product operation with a subsequent select operation in the tree into a join operation. 27
Using Heuristics in Query Optimization (7) Outline of a Heuristic Algebraic Optimization Algorithm (cont.) Using rules 3, 4, 7, and 11 concerning the cascading of project and the commuting of project with other operations, break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. 28
Using Heuristics in Query Optimization (8) Summary of Heuristics for Algebraic Optimization: The main heuristic is to apply first the operations that reduce the size of intermediate results. Perform select operations as early as possible to reduce the number of tuples and perform project operations as early as possible to reduce the number of attributes. (This is done by moving select and project operations as far down the tree as possible.) The select and join operations that are most restrictive should be executed before other similar operations. (This is done by reordering the leaf nodes of the tree among themselves and adjusting the rest of the tree appropriately.) 29
Converting Query Trees into Query Execution Plans Case: find the full name and address of all employees of “Research” department 30
Converting Query Trees into Query Execution Plans … continued To convert this into an execution plan, the optimizer might do the followings: 1.Choose an index search for the SELECT operation on DEPARTMENT (assuming one exists); 2.Choose a single-loop join algorithm that loops over the records in the result of the SELECT operation on DEPARTMENT for the join operation (assuming an index exists on the DNO attribute of EMPLOYEE); 3.Scan of the JOIN result for input to the PROJECT operator. The approach taken for executing the query may specify 2 types of evaluation: MATERIALIZED evaluation PIPELINED evaluation (preferred whenever feasible). 31
Converting Query Trees into Query Execution Plans … continued 1. Materialized evaluation The result of an operation is stored as a temporary relation (the result is physically materialized). For instance, the JOIN operation can be computed and the entire result stored as a temporary relation, which is then read as input by the algorithm that computes the PROJECT operation, which would produce the query result table. 32
Converting Query Trees into Query Execution Plans … continued 2. Pipelined evaluation The resulting tuples of an operation are produced, they are forwarded directly to the next operation in the query sequence. For example, as the selected tuples from DEPARTMENT are produced by the SELECT operation, they are placed in a buffer; the JOIN operation algorithm would then consume the tuples from the buffer, and those tuples that result from the JOIN operation are pipelined to the projection operation algorithm. 33 The advantage of pipelining is the cost savings – does not have to write the intermediate results to disk and read them back for the next operation.