express yourself regularly

Upload: thiarllisb

Post on 05-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Express Yourself Regularly

    1/7

    Express Yourself Regularly

    2005 SAP AG 1

    Applies To:

    ABAP, Netweaver 2004s

    Summary

    One can use regular expressions in ones code with Netweaver 2004s. However, since not everybody isacquainted with this relatively old technology, this tutorial will try to explain the basics.

    By: Eddy De Clercq

    Company: Katholieke Universiteit Leuven

    Date: 01 April 2006

    Having simple tastes in life I seek happiness in small things. That is also the case when a new SAP release isannounced. I was happy when Karl Kessler said that ABAP wasnt locked out from regular expressions. Infact, I was more than satisfied when I saw RE as one of the features of 2004s. Why am I so happy? When Iwrote the BSP port for the Honeypot Project a year ago I moaned about the fact that something simple likeskipping out non-(alpha)numeric characters couldnt be done in a one-liner.

    Whereas this can be done within PHP via

    ereg_replace("[^a-zA-Z0-9]","",$contents)

    ABAP/BSP needs this code

    WHILE origStr IS NOT INITIAL.IF origStr(1) CA

    '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'.

    CONCATENATE newStr origStr(1) INTO newStr.

    ENDIF.

    origStr = origStr+1(*).

    ENDWHILE.

    You cannot call this very elegant, can you? Luckily with Netweaver 2004s this is now history.

    Mathematics

    Most people know regular expressions from Ken Thompsons QED editor, but the godfather of regularexpressions is Stephen Kleene, who defined a notation he called the algebra of regular sets. Some of youmight know that the * wildcard in a search is also called the Kleene Star. Kleene is also known for his workon recursion together with people like Turing.

    Being a mathematician he laid the basis for theoretical science, as we know it. There is a drawback though.As with many powerful and small footnote things it can get rather complicated. On top of that, there aredifferent types of regular expressions: Perl, Tcl, Python, etc. each have their own version. SAP makes use of

  • 7/31/2019 Express Yourself Regularly

    2/7

    Express Yourself Regularly

    2005 SAP AG 2

    the POSIX variant a.k.a modern extended regular expressions. I agree that it is all a bit confusing and even Isometimes need to read/think things over twice in order to understand them thoroughly.

    I therefore decided to make a tutorial that could be very useful for the SDN community. Sure, there is alreadya lot of reference material available. Most of it just sums up the semantics though. I thought it would be nice ifthings were explained by way of some examples.

    Traditional expressions

    In this part we will discuss the basic principles. Lets take SAP Developers Network is the place to be for SAPDevelopers. as the text that you will be looking at. Lets start searching. As such you just need to put the textthat youre looking for as a search pattern. A pattern is always built up character by character. The Regex

    jargon for this is literal characters. A first example.

    SAP will match for SAPDevelopers Network is the place to be for SAPDevelopers.

    It seems obvious that this matches, but it isnt as obvious as you might think at first glance. It depends a bit onthe Regex engine. There are text-directed engines, and regex-directed engines also known as respectivelyDFA and NFA engines. In addition, if thats not enough there is also POSIX NFA. Since the POSIX standardis supported in NW2004s, we will concentrate on the NFA engine.The basic rule is that it is always the leftmost match that will be returned. That means that the engine starts bytrying to match the first character of the search string with the first character in the text that it is searching in. Ifthat doesnt match itll go on to the next character in the text that it is searching in, and then the next, until itfinds a first match. When it finds a match for this first character itll continue with the next character of thesearch string and so on and so forth until all the possible permutations have been carried out.If we apply this to the above example, it works like this. First the S character is checked. It has a match, thus

    it continuous with the A of the search string and so forth.

    As with many of these things, the search string is case sensitive.

    Sap will not match

    Each character is significant, thus spaces are too

    SAP Dev will match for SAP Developers Network is the place to be for SAPDevelopers.

    SAPDev will not match

    If you dont know which character will follow, you need to put a replacement character in brackets, for

    example a full stop (.). It can be used to match any character.SAP.Dev will match for SAP Developers Network is the place to be for SAPDevelopers.since the space is replaced by a full stop

    If you want to search for the full stop itself youll need to escape it. Escaping is done via a backslash (\)

    . will match for SAP Developers Network is the place to be for SAPDevelopers.

    \. will match result for the full stop

  • 7/31/2019 Express Yourself Regularly

    3/7

    Express Yourself Regularly

    2005 SAP AG 3

    .\. will match for SAP Developers Network is the place to be for SAP

    Developers.

    \.. will not match since there is no character after the full stop in thetext

    You can specify whether things must be at the start (^) or at the end ($) of the line.

    ^SAP will match for SAPDevelopers Network is the place to be for SAPDevelopers.

    Developers\.$ will match for SAP Developers Network is the place to be for

    SAP Developers.

    Developers$ will not match since the full stop, which is at the end of the

    text, was omitted

    As with the full stop, and all other special characters, you need to escape it if you want to look for ^, $ or \itself.

    ^ also has another meaning when used in conjunction with square brackets []. Inside those square bracketsyou can provide a list of characters. The characters between square brackets are often called character setsor character classes. For the purpose of clarity, I will show all matches and not the first match as I did before.

    [SDN] will match for SAP Developers Network is the place to be for SAPDevelopers.

    [EDC] will not match

    If such a character set starts with ^, the specified characters will not be used to match. In other words itllmatch everything but the characters specified

    [^SDN] will match for SAPDevelopers Network is the place to be for

    SAPDevelopers.

    In order to prevent having to provide all the characters, as in the old ABAP way, you can define ranges withthe hyphen -.ABCDEFGHIJKLMNOPQRSTUVWXYZ can thus be shortened to A-Z, abcdefghijklmnopqrstuvwxyz to a-z(things are case sensitive remember) and 0123456789 to 0-9.

    [A-Z] will match for SAP Developers Network is the place to be for SAP

    Developers.

    You can also combine character sets with plain characters.

    [DN]e will match for SAP Developers Network is the place to be for SAP

    Developers.

  • 7/31/2019 Express Yourself Regularly

    4/7

    Express Yourself Regularly

    2005 SAP AG 4

    RepetitionsRepetitions is a very interesting feature that enables you to specify how many times a character and/orcharacter class needs to be matched. There are a couple of meta characters to enable this:

    ? indicates that the preceding character is optional, meaning that it can occur 0, once or several times.

    Developers? will match for SAP Developers Network is the place to be for

    SAP Developers.

    Developers? will also match for SAP Developers Network is the place to be

    for SAP Developers.

    * indicates that the preceding character can occur any number of times, meaning 0 to n time(s)

    De*v will match for SAP Developers Network is the place to be for SAP

    Developers.

    Da*v will match for SAP Developers Network is the place to be for SAP

    Developers.

    + indicates that the preceding character needs to occur at least once

    De+v will match for SAP Developers Network is the place to be for SAP

    Developers.

    Da+v will not match

    You can be even more specific in determining how many times a character may occur. This can be done viacurly brackets {}. Within these curly brackets, you can set the minimum and maximum occurrences -> {X,Y}where X is the minimum and Y the maximum. Y is optional, thus {X} and {X,} are allowed.

    .{10} will match for SAP Developers Network is the place to be for SAPDevelopers.

    [A-Z]{1,5} will match for SAP Developers Network is the place to be for SAP

    Developers.

    [A-Z]{5} will not match

    This means that the previous mentioned meta characters can be replaced by curly brackets

    Developers{0,1} and Developers? will match for SAP Developers Network is

    the place to be for SAP Developers.

    De*v and De{0}v will match for SAP Developers Network is the place to be for

    SAP Developers.

  • 7/31/2019 Express Yourself Regularly

    5/7

    Express Yourself Regularly

    2005 SAP AG 5

    The next step is to group characters to determine the repetitions. This can be done via round brackets. Thesebrackets/parentheses indicate that something is optional.

    SAP.Dev(elopers) will match for SAP Developers Network is the place to befor SAP Developers.

    it will also match for SAP Developers Network is the place to be for SAP

    Developers.

    With these parentheses, you can also specify alternatives via |

    (SAP|PHP).Dev will match for SAP Developers Network is the place to be for

    SAP Developers.

    Summing it up

    I want to finish with an overview of the things covered in this tutorial.

    A character Will match if the character matches. Characters are case sensitive

    . Is a replacement for a single character. If you want to find a full stop you need toescape it with backslash (\)

    ^ Will match at the start of the line

    $ Will match at the end of the line

    [] A list of characters that needs to be matched

    - A range of characters

    ? Optional character

    * Can occur 0 to n times

    + Has to occur at least once

    {X,Y} Has to occur for a minimum of X and maximum of Y times. Y is optional

    () Group of characters

  • 7/31/2019 Express Yourself Regularly

    6/7

    Express Yourself Regularly

    2005 SAP AG 6

    | Specifies alternatives

    This doesnt cover everything about regular expressions. There is much more to it which I will try to cover in alater tutorial.

    Disclaimer & Liability Notice

    This document may discuss sample coding, which does not include official interfaces and therefore is notsupported. Changes made based on this information are not supported and can be overwritten during anupgrade.

    SAP will not be held liable for any damages caused by using or misusing of the code and methods suggestedhere, and anyone using these methods, is doing it under his/her own responsibility.

    SAP offers no guarantees and assumes no responsibility or liability of any type with respect to the content ofthe technical article, including any liability resulting from incompatibility between the content of the technicalarticle and the materials and services offered by SAP. You agree that you will not hold SAP responsible orliable with respect to the content of the Technical Article or seek to do so.

    Author Bio

    Eddy De Clercq has 20 years experience in computing. He currently works at the KatholiekeUniversiteit Leuven, the oldest university of the Low Countries and the largest Flemish uEddy is a member of the E-university team that creates self services (web) applications

    niversity..

  • 7/31/2019 Express Yourself Regularly

    7/7

    Express Yourself Regularly

    2005 SAP AG 7

    Copyright 2005 SAP AG, Inc. All Rights Reserved. SAP, mySAP, mySAP.com, xApps, xApp, and other SAP products and services

    mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several othercountries all over the world. All other product, service names, trademarks and registered trademarks mentioned are the trademarks oftheir respective owners.