06 - Regularne vyrazy

Operatory regularnych vyrazov

OperatorPopis
m/VZOR/ Vrati TRUE ak sa VZOR nachadza v $_
s/VZOR/NAHRADA/ Nahradi VZOR pomocou NAHRADA
tr/ZNAKY/NAHRADY/ Tento operator nahradi znaky specifikovane pomocou ZNAKY znakmi uvedenymi v NAHRADY

Hranice regularneho vyrazu

Kazdy regularny vyraz musi byt uzavrety v hraniciach urcenych pomocou delimiterov. Standardne sa za m pouzivaju dva lomitka,ale je dovolene pouzit akykolvek znak. Tendo musi nasledovat bezprostredne za m a za koncom celeho regularneho vyrazu.
Priklad:
        m{/root/home/bubko/}

Operator zhody (m//)

Operator zhody sa pouziva na hladanie urciteho vzoru v retazcoch. Velmi casto sa s nim stretneme pri prehladavani suborov.
Priklad:
       $_ = "AAA bbb AAA";
       print "Nasiel som bbb\n" if m/bbb/;

       $chcemNajst = "bbb";
       $_ = "AAA bbb AAA";
       print "Nasiel som bbb\n" if m/$chcemNajst/;

Volby operatora zhody

VolbaPopis
gNajde vsetky vyskyty vzoru v retazci. Vrati zoznam vyskytov, alebo mozme prechadzat po jednotlivych vyskytoch pomocou cyklu.
iVypne rozlisovanie medzi velkymi a malymi pismenami.
mTato volba sposobi, ze Perl povazuje retazec v ktorom sa vyhladava za viacriadkovy. Vypne sa tym optimalizacia, ktoru robi Perl pre jednoriadkove retazce.
oVyhodnoti vyraz iba raz (v cykle by sa to malo pouzivat len vtedy ak je zrejme ze sa hodnoty premennych nemenia).
sBerie retazec ako jeden riadok.
xDovoli pouzivat rozsirenu syntax pre regularne vyrazy. Toto umozni zapisat regularny vyraz strukturovane koli vacsej prehladnosti.
Priklad:
       $_ = "AAA BBB AAA";
       print "Nasiel som bbb\n" if m/bbb/i;

Operator substitucie (s///)

Operator substitucie sluzi na vykonanie zmeny obsahu retazcov. Vyzaduje dva operandy v tvare s/OPERAND1/OPERAND2/.
Priklad:
       $chcemNahradit = "bbb";
       $nahradTextom = "1234567890";
       $_ = "AAA bbb AAA";
       $vysledok = s/$needToReplace/$replacementText/;

Dost casto sa operator substitucie tiez pouziva na vymazanie podretazca:
Priklad:
        s/bbb//;

Ak sa pouzije niektory typ zatvoriek ako delimiter na uzavretie hladanej vzorky, musi sa pouzit druha sada zatvoriek na uzavretie zvysku.
Priklad:
       $_ = "AAA bbb AAA";
       $vysledok = s{bbb}{1234567890};


Volby operatora substitucie

VolbaPopis
ePrinuti vyhodnotit OPERAND2 ako vyraz.
gNahradi vsetky vyskyty vzoru v retazci.
iIgnoruje velkost pismen.
mPovazuj retazec za viacriadkovy.Vypne optimalizaciu pre jednoriadkovy etazec v $_
oVyhodnot len raz.(bacha na pouzitie v cykloch)
sPovazuj retazec za jeden riadok.
xDovoli pouzit rozsirenu syntax.

Operator priradenia (=~ a !~)

Pri regularnych vyrazoch sa standardne spracuva obsah premennej $_. Ak chceme pouzit inu premennu, tak musime pouzit operator priradenia pre regularne vyrazy.
Priklad:
       $premenna = "Jozko ma novy pocitac ";
       print("Ano, Jozko ma pocitac") if $premenna =~ m/pocitac/;


       $premenna ="Jozko ma novy pocitac";
       print("Jozko ma pocitac") if $premenna !~ m/novy/;
       $premenna =~ s/Jozko/Ferko/;
       print("\$premenna = $premenna\n");

Vytvaranie regularnych vyrazov

Sorry,ale nemam silu to vsetko prepisovat do slovenciny,ked uz su tie tabulky hotove a take pekne ....

Meta-Character Description
^ This meta-character - the caret - will match the beginning of a string or if the /m option is used, matches the beginning of a line. It is one of two pattern anchors - the other anchor is the $.
. This meta-character will match any character except for the newline unless the /s option is specified. If the /s option is specified, then the newline will also be matched.
$ This meta-character will match the end of a string or if the /m option is used, matches the end of a line. It is one of two pattern anchors - the other anchor is the ^.
| This meta-character - called alternation - lets you specify two values that can cause the match to succeed. For instance, m/a|b/ means that the $_ variable must contain the "a" or "b" character for the match to succeed.
* This meta-character indicates that the "thing" immediately to the left should be matched 0 or more times in order to be evaluated as true.
+ This meta-character indicates that the "thing" immediately to the left should be matched 1 or more times in order to be evaluated as true.
? This meta-character indicates that the "thing" immediately to the left should be matched 0 or 1 times in order to be evaluated as true. When used in conjunction with the +, _, ?, or {n, m} meta- characters and brackets, it means that the regular expression should be non-greedy and match the smallest possible string.

Meta-Brackets Description
() The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See the section "Pattern Memory" later in this chapter for more information.
(?...) If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified. See the section "Example: Extension Syntax" later in this chapter for more information.
{n, m} The curly braces let specify how many times the "thing" immediately to the left should be matched. {n} means that it should be matched exactly n times. {n,} means it must be matched at least n times. {n, m} means that it must be matched at least n times and not more than m times.
[] The square brackets let you create a character class. For instance, m/[abc]/ will evaluate to true if any of "a", "b", or "c" is contained in $_. The square brackets are a more readable alternative to the alternation meta-character.

Meta-Sequences Description
\This meta-character "escapes" the following character. This means that any special meaning normally attached to that character is ignored. For instance, if you need to include a dollar sign in a pattern, you must use \$ to avoid Perl's variable interpolation. Use \\ to specify the backslash character in your pattern.
\nnnAny Octal byte. Use zero padding for values from \000 to \077 inclusively. For larger values simply use the three-digit number (like \100 or \323).
\aAlarm.
\AThis meta-sequence represents the beginning of the string. Its meaning is not affected by the /m option.
\bThis meta-sequence represents the backspace character inside a character class; otherwise, it represents a word boundary. A word boundary is the spot between word (\w) and non-word(\W) characters. Perl thinks that the \W meta-sequence matches the imaginary characters off the ends of the string.
\BMatch a non-word boundary.
\cnAny control character.
\dMatch a single digit character.
\DMatch a single non-digit character.
\eEscape.
\ETerminate the \L or \U sequence.
\fForm Feed.
\GMatch only where the previous m//g left off.
\lChange the next character to lowercase.
\LChange the following characters to lowercase until a \E sequence is encountered.
\nNewline.
\QQuote Regular Expression meta-characters literally until the \E sequence is encountered.
\rCarriage Return.
\sMatch a single whitespace character.
\SMatch a single non-whitespace character.
\tTab.
\uChange the next character to uppercase.
\UChange the following characters to uppercase until a \E sequence is encountered.
\vVertical Tab.
\wMatch a single word character. Word characters are the alphanumeric and underscore characters.
\WMatch a single non-word character.
\xnnAny Hexadecimal byte.
\ZThis meta-sequence represents the end of the string. Its meaning is not affected by the /m option.
\$Dollar Sign.
\@Ampersand.

Quantifier Description
* The component must be present zero or more times.
+ The component must be present one or more times.
? The component must be present zero or one times.
{n} The component must be present n times.
{n,} The component must be present at least n times.
{n,m} The component must be present at least n times and no more than m times.

Priklad:
       $_ = "AA AB AC AD AE";
       m/^(\w+\W+){5}$/;


V tomto priklade hladame aspon jeden znak, ktory moze tvorit slovo nasledovany jednym, alebo ziadnym "znakom, ktory sa nemoze vyskytnut v slove :)" . (Perl berie aj koniec retazca ako takyto znak).

Extension Description
(?# TEXT) This extension lets you add comments to your regular expression. The TEXT value is ignored.
(?:...) This extension lets you add parentheses to your regular expression without causing a pattern memory position to be used.
(?=...) This extension lets you match values without including them in the $& variable.
(?!...) This extension lets you specify what should not follow your pattern. For instance, /blue(?!bird)/ means that "bluebox" and "bluesy" will be matched but not "bluebird".
(?sxi) This extension lets you specify an embedded option in the pattern rather than adding it after the last delimiter. This is useful if you are storing patterns in variables and using variable interpolation to do the matching.