Regex question

Status
Not open for further replies.

OneManCrew

Active Member
1,069
2013
545
0
I encountered a question, I wonder if anyone here can help with this :)


I'm using notepad++ to sort files using regex, now I have this line of regex:

PHP:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

What I want to do is delete everything on the page other then what regex finds with the line above.


Any expert know how to do such thing, I'm puzzled?
 
3 comments
EDIT: Could've sworn I saw PHP mentioned in your question but apparently I was wrong. Anyway, below answer is still more or less valid.

Original answer:
Not a PHP expert but what is by far the simplest solution to this problem is matching all occurrences of this pattern and copying them sequentially to a temporary string or file, using preg_match_all.

Another option which is possibly more efficient is inverting the regex using some kind of negative lookahead, but I'm not sure how they are implemented in PHP. I can look into it if you want though. PHP manual on Assertions (I'd post the link but I can't, too few posts, sadly) provides documentation on those. Once you have the inverted regex, it's just a simple case of invoking preg_replace and replace the matches with an empty string.

Hope this helps you somehow, as I said I'm not the expert you're looking for :P

EDIT 2: Here's what I think should work:
Code:
(?!(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)
Use this and replace all matches with empty strings or spaces.
If you want to remove all LINES that do not contains URLs, use
Code:
^(?!.*(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])).*
Haven't properly checked the regex so it might still fail, if it does, let me know :)
 
Last edited:
Status
Not open for further replies.
Back
Top