Notebook / Archives

Main styling and markups formatting languages for XHTML: a few notes

« Microsoft m'épate / \ Rediscovery of Regular Expressions »

December 23, 2004

Main styling and markups formatting languages for XHTML: a few notes

My goal

If you read me since the beginning, you know that I am very interested in typography and web design. [french]uZine a d'ailleurs mis à jour son Petit guide typographique à l’usage de l’internet, référencé dans l'entrée Typographie de la Wikipédia francophone. [/french] When I discovered advanced typography on the web, I saw many rules… too many. The idea came fast to build a tool (or a set of tools), a kind of advanced text filter, which would enable to handle these rules easily. But I had not a kind of time. Last week, I started to look again at this problem.

To begin you have to know that this engine will be one of the first building block of a general framework for web applications (targeting CMS). This object framework will be coded in PHP 5, but low level processing (like string filters) will be done in PHP 4. Thus I will be able to use these routines with many web hosting services.

My plans for the coming days are to release a pre-version of my typography engine as a WordPress plugin. Indeed, this filter will be used on my new scientific weblog (powered by Wordpress, relaying on PHP), not on this personal one (powered by Movable Type 2.661, in Perl).

I did not want to start from scratch, so I investigated the existing. This kind of text processing tools for the web is very recent. If you search for “typography” on freshmeat.net, you will not find anything.

But being an avid weblog reader, I already knew many styling and markups formatting languages, Textile and Markdown+SmartyPants among others.

Unfortunatly, these powerful scripts are not modular and really badly coded, that is to say in a very hackich way: unreadable and unmanageable.

Textile

The original coding: Dean Allen's PHP version

Dean Allen is the creator of Textile.

Brad Choate's Perl bounce

Textile has been ported to many languages, mainly Perl, thanks to Brad Choat.

The PHP retro-port of the Perl version from Jim Riggs

The Brad Choate's version of Textile has been ported to PHP by Jim Riggs who brought to us TextilePHP. TextilePHP is also available as a WordPress plugin, developped by Adam Gessaman. Adam Messinger wrote a list of Character Macros for Textile 2 (see the entry Documentation for Textile 2 Character Macros: a Cheat Sheet for the complete story).

There is another WordPress plugin available on Huddled Masses (entry WordPress Plugin - Textile 2.0). There are two plugins: one for Textile 1 syntax and one for Textile 2 syntax.

Markdown + SmartyPants

The original coding: John Gruber's Perl version

Everybody knows SmartyPants and Markdown, two text filters developped by John Gruber:

  • “SmartyPants is a free web publishing plug-in […] that easily translates plain ASCII punctuation characters into 'smart' typographic punctuation HTML entities. This means you can write, edit, and save your posts using plain old ASCII straight quotes, plain dashes, and plain dots, but your published posts (and final HTML output) will appear with smart quotes, em-dashes, and proper ellipses.”
  • “Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).”

SmartyPants and Markdown can both be tested on the Markdown Web Dingus (together or separatly).

Markdown has got a reverse filter which permits to take HTML and turn it back into Markdown.

The complete port from Michel Fortin and other PHP ports

These two filters have be ported to PHP:

Ports to other languages

js-markdown is a partial implementation of Markdown, written in JavaScript.

There are other ports in many languages.

It's not the end…

Continuation

Actually Michel Fortin made an interesting entry about styling languages, Prolifération des langages de style (in french), where he identified:

ReStructuredText is in fact an alternative to StructuredText being developed as a Python docstring standard (more on the Structured Text Wiki). StructuredText itself is an evolution of Setext.

Following the entry Markdown on The Tao of Mac, I discovered XhtmlForWiki. The guy behind this page plans to code a global markup converter, MarkupToMarkup.

  • Wiki2xhtml (« Wiki2xhtml est une classe PHP servant à transformer du texte écrit avec une syntaxe Wiki en XHTML valide. »);
  • DotClear (a wiki syntax);
  • txt2tags syntax (“Txt2tags is a document generator. It reads a text file with minimal markup […] and converts it to […][many] formats:”);
  • Radeox (“Radeox API is a lightwight wiki markup rendering engine API to make render engines for wikis more portable.”).

And I am sure to have forgotten a lot of other languages!…

Small text tools are far more widespread

Small tools are sometimes very useful. Matthew Mullenweg (creator of WordPress) developped a lot of them:

  • New Lines to Paragraphs (“By far the most popular code in the site, and has been widely adapted in different projects. Basically it takes PHP's nl2br function to the logical next step and converts double line breaks to paragraphs where applicable, does line breaks as before, and best of all it's aware of block-level HTML tags so it won't mess up your page.”)
  • PHP Acronym Definer (“When you run your text through this code it will define all the acronyms it can using the acronym tag. It also has a few other niceities, so check it out.”)
  • Cardinal Endings for Numbers (“Adds cardinal endings to numbers, like 1st and 2nd, and doesn't do much else. Has an option or two that might make this useful for you.”)
  • Curly Quotes Function in PHP (“The predecessor to the Texturize function. Soon to be totally depreciated. Has some neat regular expressions if you want to check it out though.”) and Curly Quotes for Movable Type (“Implementation of the curly quotes function in Perl using Brad Choate's regex plugin. As far as I know this was the first code of its type for Movable Type, though now there are some better alternatives for that system such as Smarty Pants and Textile. If there's interest I'll look into porting Texturize to Perl since it does a bit typographically than either of those systems.”) – more in the article Em and En Dashes in Movable Type on Photo Matt.

But he went a lot further developping a Textile-like tool: Texturize (“the first automagic quote ‘curlifier’”), used in WordPress (some information in the entry Texturize Finished).

That would also be interesting to have linguistics statistics, like those offered by many Perl modules, for example Lingua-EN-Syllable (“Routine for estimating syllable count in words”) or Lingua-EN-Fathom (“readability and general measurements of English text”).

To be continued…

This entry was mainly focused on styling languages and structural markups. Doing that way, I obfuscated interesting subjects like code beautifying (with syntax highlighter like Beautifier) referencings (purple numbers, citations, etc.), knowledge management with a wiki (“translating Wiki formatted text into other formats” with tools like Text::WikiFormat, managing a weblog with a wiki as described in WeblogWithWikiDiscussion, or the opposite, managing a wiki with a weblog engine, for example using MTWikiFormatPlugin), etc. A upcoming entries will deal about these domains, starting with an overview of the most appealing weblog plugins for my project (found on the Movable Type Plugin Directory or in the WordPress Wiki Plugins section).

Posted by Jean-Philippe on December 23, 2004 at 01:44 PM 13 Comments, 6462 TrackBacks

Filed in (X)HTML, Smartypants & Textile, web development

Post your own.

Comments

Post a comment
Security Code Check





Remember personal info?


Entries by category

Entries by month