John Rickman RISC OS Notes

StrongED Search and Replace

# Define a key to replace tab by comma
Tying this S&R to a key is easy. Open the modes menu by clicking Adjust
on StrongED's iconbar icon. Click Shift-Select on the name of the mode
in which you want to add the keypress. In the ModeFile opened, find the
KeyList (if it's BaseMode then find the main KeyList). Pick a suitable
key combination and assign it the following:

     Replace("\t",",",Text,NoLine,NoCase)

   Save the file when done and the new keypress should be available.

# Define a key to replace many new lines with one
In order to do this you'll need to add a search pattern to the Search
section of the relevant ModeFile.
You'll also need to a Replace pattern
to the Replace section. Finally you'll need to add a binding to the
KeyList, the main KeyList if you want to add it to BaseMode.
 (Fred)

I'm going to assume it's BaseMode you want to alter.

Ctrl-Adjust-click on StrongED's iconbar icon to load the BaseMode
ModeFile. Then click on the LoF icon on the toolbar to get a list of
various sections in the ModeFile.

Click on the 'Search' line in the LoF to move to that section and add
the following line:

  ManyNewlines      {\n}+

Next move to the 'Replace' section and add this line:

  SingleNewline     \n

Finally, click on the 'KeyList' line in the LoF, find the line that
defines ctrl-tab (^Tab) and replace it with:

  ^Tab        Replace(ManyNewlines,SingleNewline,Text,NoLine,NoCase)

Now save the file (to UserPrefs if it's not already there) and you
should have it working.


# SEARCH examples
The StrongED manual now has  a good explanation of search and replace
see >>Introduction > Search and Replace2

But here are some examples which work;

eg - find a date at start of line
     search                  ##"."##"."##

eg - everything up to and including the second tab character on a line
     search                   *\t*\t

eg - everything up to and including the second tab character
     search                   **\t**\t
     note: ** ignores newlines

eg - "fold" but not "folder"
     search                   "fold" ~ "e"

eg - all rows that end in INTMETRICS
     search                   < * "INTMETRICS">
     <  start of line
     *     followed by anything
     "INTMETRICS" followed immediately
     >         by end of line


eg - matches    ","11
     search                   punct","punct##" "
   but could be coded as:-
     search                    \"\,\"##" "

eg - string on its own on the line "*Report \g"
     search                   "*Report "\\"g"~Any

eg - SYS  - but NOT SYS "Wimp...
     search                   "SYS"..~"Wimp"{Any}4

eg - Ifconfig and Ipconfig
     search                   "I" 'fp' "config"

eg - ignore till "!" then anything but "."   Fred Graute
          (all applications in an EnumDir list)
     search                   < * "!" {~"." Any}+ >

eg - all words beginning with A
    search                    White 'aA' {AD}
         find all text which is NOT words beginning with A
    search                    ~'aA' ** (White 'aA' {AD})

eg - all rows that don't end in readme
      bad search                      < ~"readme" Any>
      will match any character that isn't an "r" at the start
      of "readme", for example the sentence:

       This is a readme file
       ^^^^^^^^^^ ^^^^^^^^^^
      produces matches for all characters indicated by "^".

      What you want is something like this:

       ~"readme" {Any}6 >

      Which checks if the string at the search pointer isn't "readme".

      If true then match the next 6 characters, because "readme" is 6
      characters long. Then test if we're at and of line, if so then
      we have a match.

      If false then the match fails in which case StrongED will bump
      the search pointer ahead one character, then tries the search
      expression again.


 eg - .... [aaa ........ [bbb] ....    find [bbb]
        search:   "[" { ~'[' ? | ' ' | '.' }+ "]"
        would find only [bbb]

 > a fairly large text file in which I want to remove a whole lot of
 > strings of the form [****], where the **** bit contains alphabetic
 > characters, upper and lower case, spaces and full stops.

 Try    "[" { ? | ' ' | '.' }+ "]"

 > Many thanks, Tony, but I think that just proves that the syntax of
 > advanced search passeth my understanding :-(

 It's not _that_ complicated!

    "[" "]"   bookends the search for
    { }+      one or more characters from
    ?         the predefined set A-Za-z
    |         or
    ' '       the set comprising space
    |         or
    '.'       the set comprising full stop

Advanced Search: Examples copied from the StrongED Help manual
If you place something between \{ and }, then StrongED will match the
pattern against the text until it fails. If you have a "+" after,
then StrongED must find it at least once to report a sucess.
If there is no "+", then it will always match..

  "B" \{"A"}    :       Matches "B", "BA", "BAA", "BAAA" and so on..
  "B" \{"A"}+   :       Match "B", but only if followed by at least 1 "A"

If you place something in square brackets, then it is optional.
If it matches the text, then fine.. if not, that's fine too..

  "A" ["B"] "C" :       Matches "ABC" and "AC"

The bar is the OR operator. We first try to match the left side.
If that doesn't work, we try the right side..

  "A" | "B"     :       Matches "A" or "B"
  "A"|"B"|"C"   :       Matches "A" or "B" or "C"

The parenthesis are used to group things together..
This is useful in connection with the OR and NOT operators and
with the wildcards..

  "A" | "B" "C"   = "A" or "B", followed by "C"
  "A" | ("B" "C") = "A" or "BC"

The ~ is the NOT operator. It will match what comes after with the text,
and will accept a "match" if it fails, and refuse if it does match.
Note that after a NOT match, we are still at the same point in the text:

  "A" ~ "B"     :       Matches "A" if the following char is not "B"
  "A" ~ "B" Any :       Matches "A" and the following char,
  but only if it's not a "B"..

There are two wildcard operators. The "*" and the "**"

  "A" * "B"     :       Matches "A" \ "B"
  "A" ** "B"    :       Matches "A" \ "B"

#fH2:advanced examples

To capture an entire line if it contains two certain words,
irrespective of order..

  \< \* (("foo" \* "bar") | ("bar" \* "foo")) \* \>

To match an entire paragraph..

  \< ~\\n \{~(\\n\\n) (.|\\n)}+ \\n

To match an entire paragraph if it contains a certain word, in this example it's 'foobar'..

  \< ~\\n \{~(\\n\\n) ~"foobar" (.|$)} "foobar" \{~(\\n\\n) (.|$)}

To match an entire paragraph if it contains the word at the caret..

  \< ~\\n \{~(\\n\\n) ~CW (.|$)} CW \{~(\\n\\n) (.|$)}



# Replace examples
  To replace with null use "" not an empty icon - both forms work
  for the replace but recall only works with the ""

eg   insert newline before upper case letter
     search                   @1'A-Z'@2
     replace                  /nl@12

eg   replace a date by the date followed by the string "-->"
     search                   @1##"."##"."##@2
     replace                  @12" --> "

eg - add a string to end of line
     search   /n
     replace  /t" 0"/n
     before   Acland	 juliet.acland@orange.fr
     after    Acland	 juliet.acland@orange.fr	 0

eg - replace C1,C2,C3 by Cv1,Cv2,Cv3
     search   "C"@1#
     replace  "Cv"@19

eg - replace Cv1  ,Cv2  ,Cv3   by Cv1 ,Cv2 ,Cv3
     search   "Cv"@1#@2"  "
     replace  "Cv"@12" "

eg - replace multiple new lines with a single new line
     search   {\n]+
     replace  \n
     only does it once

eg - remove string up to and including a common word keeping
     everything that follows on the same line
     search   *"Then "@1
     replace  @19
     before     IfThere !anyoldapp Then Filer_Run !anyoldapp
     after      Filer_Run !anyoldapp

eg - change array to vector and rename it
     entries are of the form:
       stack(i-1,1) = stack(i,1)
       stack(lastinstack-1,1) = save_col
     these should become:
       scols(i-1) = scols(i)
       scols(lastinstack-1) = save_col

    search    "stack"@1*@2",1"
    replace   "scols"@12


eg - replace first blank after email address with tab
    search    "." * @1 " "
    replace   @01\t

eg - remove everything after email address in Messenger addr list !!!
    search    "@"*@1/t{.}
    replace   @01

eg - swap columns in a table (Archive v17/04)
    search:   <{ad}@1"\t"@2{.}>
    replace:  @29"\t"@01

eg - another swap example
    search:   *\t*\t@1{Any}+
    replace:  @19"   "@01
    before      .t2.PICT3387/JPG    JPEG   2004-10-18 11:02:43
    after        2004-10-18 11:02:43    .t2.PICT3387/JPG  JPEG

  this format can now be sorted into date order

eg - and another swap
    search:   *\t@1{Any}+
    replace:  @19"   "@01
    before      .t2.PICT3387/JPG    JPEG   2004-10-18 11:02:43
    after        JPEG   2004-10-18 11:02:43    .t2.PICT3387/JPG

  this format can now be sorted into filetype date order


eg - split a column in two
    search                    \"\,\"@3##@4" - "@5
    replace                   @03@34@03@59
    :-
    before      "songs of freedom","11 - Bob Marley"
    after       "songs of freedom","11","Bob Marley"

   explanation:       \"      finds the first quote character
                      \,      finds the comma
                      \"      finds the 2nd quote character
                      @3      marks the end of the string
      therefore       @03 is  ","  for replace string
      simmilarly      @34 picks up the two digits ##
                      @03 is another ","
      and             @59 picks up the remainder of the line
      note            the string " - " is dropped

eg - replace trailing comma:
    search                    \,>
    replace                   (null string)

eg - replace rubbish in email attachment:
    search                    "="@1\n
    replace                   @19
     find "=" followed by hex 0A, replace with hex 0A

eg - replace 8 spaces by two tabs
    search                    "        "
    replace                   \x09\x09
    better to use  >>block>process>spaces to tab

  line ends are best handled by >>edit >change \n >.. but..

eg - replace cr with lf       Risc to Mac
    search                    \x0D       **** CASE ***
    replace                   \x0A

eg - replace lf with crlf     Risc to Dos
    search                    @1\x0A@2
    replace                   \x0D@12

eg - replace  crlf with lf    Dos to Risc
    search                    \x0D@1\x0A@2
    replace                   @12

eg - insert string at start of line with hit
    search                    *".ttc"
    replace                   "string"@09

eg - put tabs after a string
    search                    ".ttc"
    replace                   @09\t\t

eg - replace all text that is not a word beginning A or a
    search                    ~'aA' ** (White @1 'aA' {AD})
    replace                   @09\n
   (see below for explanation)

eg - replace leading blanks
    search                    <{" "}+
    replace                   ""

eg - replace first 7 characters of each line with 7 blanks
    search                    <.......
    replace                   "       "

eg - delete lines containing a string (at the end)
    search                    *"GIF"\n
    replace                   ""

eg - delete null lines
    search                    <\n
    replace                   ""

eg - delete blank lines
    search                    <{" "}+\n
    replace                   ""


 > > 'Wrap lines' doesn't remove leading whitespace but as it's a
      simple S&R
 > > that could be changed or a new S&R could be created to cater for
this,
> > eg:
> >
> > Search for:   ~NL Any @1 {'\t '} ~(NL NL) NL {'\t '}
>
> Interpretation (hopefully correct): Not a newline, Any string, @1
> marker, zero or more tabs and spaces, Not 2 newlines, newline,
> zero or more tabs and spaces.

Close but for absolute clarity I'll spell it out:

~NL      initial char is not a newline (NB doesn't advance searchpointer)
 Any      match any char (here it's used to advance the search pointer)
 @1       set marker #1, by default @0 is start and @9 is end of match
 {'\t '}  zero or more tabs and/or spaces [1]
 ~(NL NL) not 2 consecutive newlines (NB doesn't advance searchpointer)
 NL       match newline
 {'\t '}  zero or more tabs and/or spaces [1]

[1] using a set (aka character class) is faster than using alternation
(\t|" ").

> > Replace with: @01 " "
>
> Interpretation: From start to the @1 marker, followed by a space.

Yep.

> The bit before the marker is a line of text without the trailing
> whitespace. The bit after it is the trailing whitespace,
> the newline,  and the leading whitespace on the next line.

The bit before the marker is the last char on the line.

> The 'not 2 NL' bit presumably prevents a match for lines that are
> followed by a blank line.

Yes.

> Such a search/replace be made to operate directly, responding to a
> keystroke. But can it be made to act only on current line (joining
> it to the next line)?

No, you have to select a block if you want to restrict the scope. You
could try the following untested definition which requires the cursor
to be in the first line:

BlockClear() StartOfTLine() BlockMark_Continous() CaretDown()
EndOfTline() BlockMark_Continous() Replace(<parameters go here>)

# search with not ~
> Well, this one (as I hypothesised) didn't work, for example:
>
> Replace("(~\x0D)\x0A\x0D","\l",text,noline,nocase)
>
> When presented with a string of 0D0A 0D0A 0D0A, it simply treated it as
> two 0A0Ds with a leading 0D and trailing 0A.  The first 0D, and the
> third, were supposed to have returned an invalid match.  What went
> wrong?

It would seem that you are misunderstanding how ~ works.

When the search expression ismatched against the string 0D0A 0D0A 0D0A
the following happens:

 - ~\x0D is matched against 0D, this fails so the matching stops and
   StrongED moves to the next character in the string, ie 0A

 - Matching ~\x0D against 0A succeeds as does the rest of the search
   expression so the replacement is made.

 - The search continues after the replacement string. If * is the
   replacement string then the situation now looks like this:
   0D*0A 0D0A
      ^ matching continues here

 - The match at the continuation point succeeds so another replacement
   is made. resulting in 0D**0A.

 - The next match fails as there's only a single 0A left.

Hopefully this clarifies how ~ works. If not, then don't hesitate to ask
further questions.

Cheers, Fred.