06 - Regularne vyrazy
Operatory regularnych vyrazov
Operator | Popis |
m/VZOR/ | Vrati TRUE ak sa VZOR nachadza v $_ |
s/VZOR/NAHRADA/ | Nahradi VZOR pomocou NAHRADA |
tr/ZNAKY/NAHRADY/ | Tento operator nahradi znaky specifikovane pomocou ZNAKY znakmi uvedenymi v NAHRADY |
Hranice regularneho vyrazu
Kazdy regularny vyraz musi byt uzavrety v hraniciach urcenych pomocou delimiterov. Standardne sa za m pouzivaju dva lomitka,ale je dovolene pouzit akykolvek znak. Tendo musi nasledovat bezprostredne za m a za koncom celeho regularneho vyrazu.Priklad:
        m{/root/home/bubko/}
Operator zhody (m//)
Operator zhody sa pouziva na hladanie urciteho vzoru v retazcoch. Velmi casto sa s nim stretneme pri prehladavani suborov.Priklad:
       $_ = "AAA bbb AAA";
       print "Nasiel som bbb\n" if m/bbb/;
       $chcemNajst = "bbb";
       $_ = "AAA bbb AAA";
       print "Nasiel som bbb\n" if m/$chcemNajst/;
Volby operatora zhody
Volba | Popis |
g | Najde vsetky vyskyty vzoru v retazci. Vrati zoznam vyskytov, alebo mozme prechadzat po jednotlivych vyskytoch pomocou cyklu. |
i | Vypne rozlisovanie medzi velkymi a malymi pismenami. |
m | Tato volba sposobi, ze Perl povazuje retazec v ktorom sa vyhladava za viacriadkovy. Vypne sa tym optimalizacia, ktoru robi Perl pre jednoriadkove retazce. |
o | Vyhodnoti vyraz iba raz (v cykle by sa to malo pouzivat len vtedy ak je zrejme ze sa hodnoty premennych nemenia). |
s | Berie retazec ako jeden riadok. |
x | Dovoli pouzivat rozsirenu syntax pre regularne vyrazy. Toto umozni zapisat regularny vyraz strukturovane koli vacsej prehladnosti. |
       $_ = "AAA BBB AAA";
       print "Nasiel som bbb\n" if m/bbb/i;
Operator substitucie (s///)
Operator substitucie sluzi na vykonanie zmeny obsahu retazcov. Vyzaduje dva operandy v tvare s/OPERAND1/OPERAND2/.Priklad:
       $chcemNahradit = "bbb";
       $nahradTextom = "1234567890";
       $_ = "AAA bbb AAA";
       $vysledok = s/$needToReplace/$replacementText/;
Dost casto sa operator substitucie tiez pouziva na vymazanie podretazca:
Priklad:
        s/bbb//;
Ak sa pouzije niektory typ zatvoriek ako delimiter na uzavretie hladanej vzorky, musi sa pouzit druha sada zatvoriek na uzavretie zvysku.
Priklad:
       $_ = "AAA bbb AAA";
       $vysledok = s{bbb}{1234567890};
Volby operatora substitucie
Volba | Popis |
e | Prinuti vyhodnotit OPERAND2 ako vyraz. |
g | Nahradi vsetky vyskyty vzoru v retazci. |
i | Ignoruje velkost pismen. |
m | Povazuj retazec za viacriadkovy.Vypne optimalizaciu pre jednoriadkovy etazec v $_ |
o | Vyhodnot len raz.(bacha na pouzitie v cykloch) |
s | Povazuj retazec za jeden riadok. |
x | Dovoli pouzit rozsirenu syntax. |
Operator priradenia (=~ a !~)
Pri regularnych vyrazoch sa standardne spracuva obsah premennej $_. Ak chceme pouzit inu premennu, tak musime pouzit operator priradenia pre regularne vyrazy.Priklad:
       $premenna = "Jozko ma novy pocitac ";
       print("Ano, Jozko ma pocitac") if $premenna =~ m/pocitac/;
       $premenna ="Jozko ma novy pocitac";
       print("Jozko ma pocitac") if $premenna !~ m/novy/;
       $premenna =~ s/Jozko/Ferko/;
       print("\$premenna = $premenna\n");
Vytvaranie regularnych vyrazov
Sorry,ale nemam silu to vsetko prepisovat do slovenciny,ked uz su tie tabulky hotove a take pekne ....Meta-Character | Description |
---|---|
^ | This meta-character - the caret - will match the beginning of a string or if the /m option is used, matches the beginning of a line. It is one of two pattern anchors - the other anchor is the $. |
. | This meta-character will match any character except for the newline unless the /s option is specified. If the /s option is specified, then the newline will also be matched. |
$ | This meta-character will match the end of a string or if the /m option is used, matches the end of a line. It is one of two pattern anchors - the other anchor is the ^. |
| | This meta-character - called alternation - lets you specify two values that can cause the match to succeed. For instance, m/a|b/ means that the $_ variable must contain the "a" or "b" character for the match to succeed. |
* | This meta-character indicates that the "thing" immediately to the left should be matched 0 or more times in order to be evaluated as true. |
+ | This meta-character indicates that the "thing" immediately to the left should be matched 1 or more times in order to be evaluated as true. |
? | This meta-character indicates that the "thing" immediately to the left should be matched 0 or 1 times in order to be evaluated as true. When used in conjunction with the +, _, ?, or {n, m} meta- characters and brackets, it means that the regular expression should be non-greedy and match the smallest possible string. |
Meta-Brackets | Description |
---|---|
() | The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See the section "Pattern Memory" later in this chapter for more information. |
(?...) | If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified. See the section "Example: Extension Syntax" later in this chapter for more information. |
{n, m} | The curly braces let specify how many times the "thing" immediately to the left should be matched. {n} means that it should be matched exactly n times. {n,} means it must be matched at least n times. {n, m} means that it must be matched at least n times and not more than m times. |
[] | The square brackets let you create a character class. For instance, m/[abc]/ will evaluate to true if any of "a", "b", or "c" is contained in $_. The square brackets are a more readable alternative to the alternation meta-character. |
Meta-Sequences | Description |
---|---|
\ | This meta-character "escapes" the following character. This means that any special meaning normally attached to that character is ignored. For instance, if you need to include a dollar sign in a pattern, you must use \$ to avoid Perl's variable interpolation. Use \\ to specify the backslash character in your pattern. |
\nnn | Any Octal byte. Use zero padding for values from \000 to \077 inclusively. For larger values simply use the three-digit number (like \100 or \323). |
\a | Alarm. |
\A | This meta-sequence represents the beginning of the string. Its meaning is not affected by the /m option. |
\b | This meta-sequence represents the backspace character inside a character class; otherwise, it represents a word boundary. A word boundary is the spot between word (\w) and non-word(\W) characters. Perl thinks that the \W meta-sequence matches the imaginary characters off the ends of the string. |
\B | Match a non-word boundary. |
\cn | Any control character. |
\d | Match a single digit character. |
\D | Match a single non-digit character. |
\e | Escape. |
\E | Terminate the \L or \U sequence. |
\f | Form Feed. |
\G | Match only where the previous m//g left off. |
\l | Change the next character to lowercase. |
\L | Change the following characters to lowercase until a \E sequence is encountered. |
\n | Newline. |
\Q | Quote Regular Expression meta-characters literally until the \E sequence is encountered. |
\r | Carriage Return. |
\s | Match a single whitespace character. |
\S | Match a single non-whitespace character. |
\t | Tab. |
\u | Change the next character to uppercase. |
\U | Change the following characters to uppercase until a \E sequence is encountered. |
\v | Vertical Tab. |
\w | Match a single word character. Word characters are the alphanumeric and underscore characters. |
\W | Match a single non-word character. |
\xnn | Any Hexadecimal byte. |
\Z | This meta-sequence represents the end of the string. Its meaning is not affected by the /m option. |
\$ | Dollar Sign. |
\@ | Ampersand. |
Quantifier | Description |
---|---|
* | The component must be present zero or more times. |
+ | The component must be present one or more times. |
? | The component must be present zero or one times. |
{n} | The component must be present n times. |
{n,} | The component must be present at least n times. |
{n,m} | The component must be present at least n times and no more than m times. |
Priklad:
       $_ = "AA AB AC AD AE";
       m/^(\w+\W+){5}$/;
V tomto priklade hladame aspon jeden znak, ktory moze tvorit slovo nasledovany jednym, alebo ziadnym "znakom, ktory sa nemoze vyskytnut v slove :)" . (Perl berie aj koniec retazca ako takyto znak).
Extension | Description |
---|---|
(?# TEXT) | This extension lets you add comments to your regular expression. The TEXT value is ignored. |
(?:...) | This extension lets you add parentheses to your regular expression without causing a pattern memory position to be used. |
(?=...) | This extension lets you match values without including them in the $& variable. |
(?!...) | This extension lets you specify what should not follow your pattern. For instance, /blue(?!bird)/ means that "bluebox" and "bluesy" will be matched but not "bluebird". |
(?sxi) | This extension lets you specify an embedded option in the pattern rather than adding it after the last delimiter. This is useful if you are storing patterns in variables and using variable interpolation to do the matching. |