Web Teknologi I (MKB511C) Minggu 12 Page 1 MINGGU 12 Web Teknologi I (MKB511C) Pokok Bahasan: – Text processing perl-compatible regular expression/PCRE Tujuan Instruksional Khusus: – Mahasiswa dapat menggunakan fungsi regular expression pada PHP.
Agenda Dasar syntax PCRE Regular expression untuk mencari pattern, mengganti text dan memisahkan string
PCRE Fungsi PCRE terdiri dari: 1.preg_filter: Perform a regular expression search and replace 2.preg_grep: Return array entries that match the pattern 3.preg_last_error: Returns the error code of the last PCRE regex execution 4.preg_match_all: Perform a global regular expression match 5.preg_match: Perform a regular expression match 6.preg_quote: Quote regular expression characters 7.preg_replace_callback: Perform a regular expression search and replace using a callback 8.preg_replace: Perform a regular expression search and replace 9.preg_split: Split string by a regular expression Note: This extension maintains a global per-thread cache of compiled regular expressions (up to 4096).
Keterbatasan PCRE PCRE memiliki keteerbatasan, tp diharapkan batasan tsb tidak terlampui dalam praktek. 1.Panjang patern maksimum adalah 64K data unit tergantung library yang digunakan (8-bit library, 16-bit library, atau 32-bit library) 2.Semua nilai pada quantifier berulang harus kurang (FFFFh/32bit). 3.Tidak ada batas sub pattern, tapi tidak lebih dari 65535, defaultnya Batasan forward reference untuk sub-patttern berikutnya adalah Maksimum panjang nama subpattern adalah 32 huruf, & maksimum jumlah dari nama subpattern adalah Maksimum panjang nama subpattern dalam (*MARK), (*PRUNE), (*SKIP), atau (*THEN) adalah 255 untuk 8-bit library & untuk 16-bit & 32-bit libraries 7.Panjang maksimum dari string subject sama dengan nilai maksimal integer.
Fungsi preg_match Regular expression fungsi php yang powerfull untuk mencari dan memodifikasi text Syntax: int preg_match ( string $pattern, string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] ) Syntax: int preg_match_all ( string $pattern, string $subject [, array &$matches [, int $flags = PREG_PATTERN_ORDER [, int $offset = 0 ]]] )
Parameter regular expression fungsi preg_match() Parameters: –pattern: The pattern to search for, as a string. –subject: The input string. –matches: If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on. –flags: flags can be the following flag PREG_OFFSET_CAPTURE If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1. –offset: Normally, the search starts from the beginning of the subject string. The optional parameter offset can be used to specify the alternate place from which to start the search (in bytes). Note: Using offset is not equivalent to passing substr($subject, $offset) to preg_match() in place of the subject string, because pattern can contain assertions such as ^, $ or (?<=x). Compare: –Return Values: preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred. Warning: This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE. Please read the section on Booleans for more information. Use the === operator for testing the return value of this function.
PREG constants constantdescriptionsince PREG_PATTERN_ORDER Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on. This flag is only used with preg_match_all(). PREG_SET_ORDER Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on. This flag is only used with preg_match_all(). PREG_OFFSET_CAPTURESee the description of PREG_SPLIT_OFFSET_CAPTURE PREG_SPLIT_NO_EMPTYThis flag tells preg_split() to return only non-empty pieces. PREG_SPLIT_DELIM_CAPTURE This flag tells preg_split() to capture parenthesized expression in the delimiter pattern as well PREG_SPLIT_OFFSET_CAPTURE If this flag is set, for every occurring match the appendant string offset will also be returned. Note that this changes the return values in an array where every element is an array consisting of the matched string at offset 0 and its string offset within subject at offset 1. This flag is only used for preg_split() PREG_NO_ERRORReturned by preg_last_error() if there were no errors PREG_INTERNAL_ERRORReturned by preg_last_error() if there was an internal PCRE error PREG_BACKTRACK_LIMIT_ERRORReturned by preg_last_error() if backtrack limit was exhausted PREG_RECURSION_LIMIT_ERRORReturned by preg_last_error() if recursion limit was exhausted PREG_BAD_UTF8_ERROR Returned by preg_last_error() if the last error was caused by malformed UTF-8 data (only when running a regex in UTF-8 mode) PREG_BAD_UTF8_OFFSET_ERROR Returned by preg_last_error() if the offset didn't correspond to the begin of a valid UTF-8 code point (only when running a regex in UTF-8 mode) PCRE_VERSIONPCRE version and release date (e.g. " Dec-2006").5.2.4
Quantifiers for Matching a Recurring Character SymbolDescriptionExample *Zero or more instancesa* +One or more instancesa+ ?Zero or one instancea? {n}n instancesa{3} {n,}At least n instancesa{3,} {,n}Up to n instancesa{,2} {n1, n2}At least n1 instances, no more than n2 instances a{1,2}
Mencari pattern huruf dan dengan “.” print " \n"; $pattern = "/aa/"; $text = "aardvark advocacy"; print preg_match( $pattern, $text, $array). "\n"; print_r( $array ); print " \n"; preg_match("/d./", "aardvark advocacy", $array, PREG_OFFSET_CAPTURE ). "\n"; print_r( $array ); print " \n";
Mencari pattern pengulangan $pattern = "/a{2,3}/"; preg_match($pattern, "aaardvark advocacy", $array, PREG_OFFSET_CAPTURE ). "\n"; print_r( $array );
Mencari pattern dengan “*” $pattern ="/p.*t/"; $text = "pot post pat patent term"; if (preg_match ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Mencari pattern dengan “?” $pattern ="/p.*?t/"; $text = "pot post pat patent term"; if (preg_match ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Mencari pattern dengan range huruf lowercase alphabetical character or the numbers 3, 4, and 7 $pattern ="/[a-z347]+/"; $text = "AB dkfd773sxFF"; if (preg_match ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; } $pattern ="/[^a-z347]+/"; $text = "AB dkfd773sxFF"; if (preg_match ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; } ^ untuk inverse
Mencari pattern dengan karakter backslashed CharMatches \dAny number \DAnything other than a number \sAny kind of whitespace \SAnything other than whitespace \wAlphanumeric characters (including the underscore character) \WAnything other than an alphanumeric character or an underscore $pattern ="/p[a-zA-Z0-9_]+t/"; $text = "HIV patogi pattern"; //if (preg_match ( $pattern, $text, $array) ) { if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; } $pattern ="/p\w+t/"; $text = "HIV pa_togi paxxxxxttern"; if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Escape karakter sebagai anchors CharMatches \ABeginning of string \bWord boundary \BNot a word boundary \ZEnd of string (matches before final newline or at end of string) \zEnd of string (matches only at very end of string) $pattern ="/\bp\w+t\b/"; $text = "pot post pair pat patent"; if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; } $pattern ="/\./"; $text = "pot post. pair pat. patent"; if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Bekerja dengan subpattern 1 $pattern ="/(don't)\s+(panic)/"; $text = "Whatever you do, don't panic!"; if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Bekerja dengan subpattern 2 $pattern ="/(\d+)\.(\d+)\.(\d+)\.(\d+)/"; $text = " "; if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Bekerja dengan Pipe “|” Pipe “|” digunakan untuk mengkombinasi pattern $pattern ="/www\.example(\.com|\.co\.uk)/"; $text = " if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Anchoring pada regular expression Simbol “^” digunakan untuk mengguji pattern pada line pertama Simbol “$” digunakan untuk menguji pattern pada line terakhir $pattern ="/^a/"; $text = "apple banana flea dear"; if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; } $pattern ="/ea$/"; $text = "apple banana dear flea"; if (preg_match_all ( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Multidimensi array $pattern ="/(\d+)-(\d+)-(\d+)/"; $text = " , , "; if (preg_match_all ( $pattern, $text, $array, PREG_SET_ORDER ) ) { print " \n"; print_r( $array ); print " \n"; }
Mengganti text $pattern ="/Sarah Williams/"; $text = "Our Secretary, Sarah Williams is pleased to welcome you."; $replace = "Rev. P.W. Goodchild"; $text = preg_replace($pattern, $replace, $text); print $text;
Mengkonversi dd/mm/yyyy ke mm/dd/yyyy dengan preg_replace() $pattern ="|(\d+)/(\d+)/(\d+)|"; $text = "25/12/2000"; $replace = "$2/$1/$3"; $text = preg_replace($pattern, $replace, $text); print $text;
Contoh konversi text lainnya $text = "25/12/99, 14/5/00. Copyright 2003"; $regs = array( "|\b(\d+)/(\d+)/(\d+)\b|", "/([Cc]opyright) 2003/" ); $reps = array( "$2/$1/$3", "$1 2004" ); $text = preg_replace( $regs, $reps, $text ); print "$text ";
PCRE modifier PatternDescription /iCase insensitive. /eTreats replacement string in preg_replace() as PHP code. /m$ and ^ anchors match at newlines as well as the beginning and end of the string. /sMatches newlines (newlines are not normally matched by.). /xWhitespace outside character classes is not matched to aid readability. To match whitespace, use \s, \t, or \. /AMatches pattern only at start of string (this modifier is not found in Perl). /EMatches pattern only at end of string (this modifier is not found in Perl). /UMakes the regular expression ungreedy; the minimum number of allowable matches is found (this modifier is not found in Perl).
Contoh penggunaan modifier $pattern = "/^\w+:\s+(.*)$/m"; $text = "name: matt\noccupation: coder\neyes: blue\n"; if ( preg_match_all( $pattern, $text, $array ) ) { print " \n"; print_r( $array ); print " \n"; }
Penggunaan preg_split() $pattern = "/, | and /"; $text = "apples, oranges, peaches and grapefruit"; $fruitarray = preg_split( $pattern, $text ); print " \n"; print_r( $fruitarray ); print " \n";
Review & Latihan W12 1.Buat file php yang berisi form sebagai berikut: 1.Buat regular expression untuk mengektrack alamat , contoh text : Telephone: (021)