Summary of
WikiUseful TextFormattingRegularExpressions:
- Expressions for inline tags
Quoted emphasis:
('')(.*?)\1 -> <em>$2</em>
Quoted strong:
(''')(.*?)\1 -> <strong>$2</strong>
- Expressions for block level tags
Preformatted code (space at beginning of line):
^ +(.*)\n -> <pre>$1</pre>
with a subsequent match and replace of
</pre><pre> ->
- Expressions for nested group tags (such as bullets)
Ordered list (tab-numeral-space at beginning of line):
^\t[0-9]\s*(.*)(?:\n?|$) -> <li>$1</li>
with a subsequent match and replace of
(?<!</li>)((?:<li>.*</li>)+)(?!<li>) -> <ol>$1</ol>
Unordered list (tab-asterisk-space at beginning of line with consideration for previous unordered list parsing):
^\t\*\s*(.*)(?:\n?|$) -> <li>$1</li>
with a subsequent match and replace of
(?<!</li>|<ol>)((?:<li>.*</li>)+)(?!<li>|</ol>) -> <ul>$1</ul>
- Expressions for parsing links
standard protocols, not image:
(http|news|ftp|mailto)\:(\/\/)?((?:\S(?!\.gif|\.jpg|\.jpeg))+)\s --> <a href="$1:$2$3">$3</a>
standard protocols, inline image:
(?<!href=")(http|news|ftp|mailto)\:(\/\/)?((?:\S(?!\.gif|\.jpg|\.jpeg|\.png))+\S)(\.gif|\.jpg|\.jpeg|\.png)
--> <img src="$1:$2$3$4" border="0">
- Expressions for alternate formatting rules (i.e. rules not native to TheOriginalWiki)
The above are the rules used for
HtagWiki. I would be really interested to know what Ward uses, so we could analyse posssible problems with the various
RegularExpressions. I also started to realise that after a certain number of rules it starts getting inefficient, and of course the fastest method (I can think of) would be to implement a custom
WikiParser. Doesn't seem to be an issue yet though. --
SvenNeumann
The following paragraph was moved from
TextFormattingRegularExpressions.
This page could have been really, genuinely useful to my endeavors the last two days and I was very happy when I found it. Unfortunately I must say I find many of the expressions here dodgy at best, and just wrong at worst. Could we turn this into a resource for the (many) ways that
WikiAuthors have implemented this? --
SvenNeumann
Let's use this page to discuss the
RegularExpressions and use the other page as a document.
Great idea! Maybe I'm just a bit slow, but some of these RegExes give me a splittin' headache :)
The emphasis
RegularExpression /'{3}(.*?)'{3}/ looks like it would match situations where the apostrophes contain nothing. Wouldn't
/'{3}(.*)'{3}/ be simpler? And horizontal line
RegularExpression should only match
exactly 4 dashes not 4 or more. --
AdewaleOshineye
Also in
HtagWiki I use the backreferencing to make it easy to match several alternate formatting rules like:
(''|//)(.*?)\1
for example to allow the slash notation too in one simple expression. Also, I believe matching blank apostrophe parts is accuarate too, as long as the rules are ordered with the expression gobbling the most quotes first. -- sn
Let's try this one:
^ +(.*)\n','<pre>$1</pre>
to mark all monospaced lines and
</pre><pre>
for deletion after to merge the lines into a single block. is that accurate?
What is an efficient quick and dirty way for the spell checker? I tried a
RegularExpression but that took over 14seconds for a small page and a 270k dictionary. A simple search for each word takes only 0.4seconds. Of course indexing would speed this up. But perhaps I'm approaching this from the wrong end too. How is simple spell checking usually implemented? -- sn
By the users... it's a wiki, after all.