Main styling and markups formatting languages for XHTML: a few notes
My goal
If you read me since the beginning, you know that I am very interested in typography and web design. [french]uZine a d'ailleurs mis à jour son Petit guide typographique à l’usage de l’internet, référencé dans l'entrée Typographie de la Wikipédia francophone. [/french] When I discovered advanced typography on the web, I saw many rules… too many. The idea came fast to build a tool (or a set of tools), a kind of advanced text filter, which would enable to handle these rules easily. But I had not a kind of time. Last week, I started to look again at this problem.
To begin you have to know that this engine will be one of the first building block of a general framework for web applications (targeting CMS). This object framework will be coded in PHP 5, but low level processing (like string filters) will be done in PHP 4. Thus I will be able to use these routines with many web hosting services.
My plans for the coming days are to release a pre-version of my typography engine as a WordPress plugin. Indeed, this filter will be used on my new scientific weblog (powered by Wordpress, relaying on PHP), not on this personal one (powered by Movable Type 2.661, in Perl).
I did not want to start from scratch, so I investigated the existing. This kind of text processing tools for the web is very recent. If you search for “typography” on freshmeat.net, you will not find anything.
But being an avid weblog reader, I already knew many styling and markups formatting languages, Textile and Markdown+SmartyPants among others.
Unfortunatly, these powerful scripts are not modular and really badly coded, that is to say in a very hackich way: unreadable and unmanageable.
Textile
The original coding: Dean Allen's PHP version
Dean Allen is the creator of Textile.
- TextDrive:
“TextDrive is a hosting company run by and for people who love publishing on the web.”
(example of an adoption in TextDriven on bradchoate.com); - Textpattern aka TXP,
“a flexible, elegant, easy-to-use content management system for all kinds of websites, even weblogs”
; with the very complete Textpattern Support Forum; - Textile tool on Textism (weblog of Cardigan Industries) – help available on the Textpattern Support Forum / Textile.
A part of Dean's Textile galaxy:
- 20030708 – Textile 2 (news);
- 20030527 – Restoration (news);
- 20030221 – Textile (news);
- 20030113 – Pomegranate (news and apparition of Detextile);
- 20030107 – Textpattern (announce of Textpattern);
- 20021227 – Structure (structural meaning of
cite
– citation –,em
– emphasis – andi
– mere style –); - 20021221 – Coming Along (news);
- 20021217 – Five (news);
- 20021213 – Textile (first announce of Textile);
- 20010817 – Typography for Writers (the founding essay behind Textile and Textpattern – all I can say is that Dean Allen has failed to accomplish what he dreamed of).
Entries from Dean Allen about Textile:
Brad Choate's Perl bounce
Textile has been ported to many languages, mainly Perl, thanks to Brad Choat.
- Movable Type User Manual: Textile Release 1.1 (covering Textile 1);
- Movable Type User Manual: Textile 2 (The documentation for the syntax supported with the version 2);
- 20040219 – MT-Textile 2.0.2 retouch;
- 20040218 – MT-Textile 2.0.2;
- 20040205 – MT-Textile does that???;
- 20040204 – Don't look now…
- 20030708 – Thither MT-Textile 2 (beta) (Textile 2 on Textism);
- 20030630 – Whither MT-Textile 2?;
- 20030527 – MT-Textile 2.0 beta;
- 20030523 – MT-Textile 2 ready for testing.
Around TextileFormatting for Six Apart's Movable Type, aka MT-Textile:
Blog entries:
The PHP retro-port of the Perl version from Jim Riggs
The Brad Choate's version of Textile has been ported to PHP by Jim Riggs who brought to us TextilePHP. TextilePHP is also available as a WordPress plugin, developped by Adam Gessaman. Adam Messinger wrote a list of Character Macros for Textile 2 (see the entry Documentation for Textile 2 Character Macros: a Cheat Sheet for the complete story).
There is another WordPress plugin available on Huddled Masses (entry WordPress Plugin - Textile 2.0). There are two plugins: one for Textile 1 syntax and one for Textile 2 syntax.
Markdown + SmartyPants
The original coding: John Gruber's Perl version
Everybody knows SmartyPants and Markdown, two text filters developped by John Gruber:
- “SmartyPants is a free web publishing plug-in […] that easily translates plain ASCII punctuation characters into 'smart' typographic punctuation HTML entities. This means you can write, edit, and save your posts using plain old ASCII straight quotes, plain dashes, and plain dots, but your published posts (and final HTML output) will appear with smart quotes, em-dashes, and proper ellipses.”
- “Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).”
SmartyPants and Markdown can both be tested on the Markdown Web Dingus (together or separatly).
Markdown has got a reverse filter which permits to take HTML and turn it back into Markdown.
The complete port from Michel Fortin and other PHP ports
These two filters have be ported to PHP:
- Michel Fortin has ported both, so we have PHP SmartyPants and PHP Markdown. Both can be tested on the PHP Markdown Web Dingus (as the original version, together or separatly). There is a Textile compatibility mode in PHP Markdown, something really useful. If you want to know more about PHP Markdown, have a look at the entry La petite histoire de PHP Markdown (in french).
- SmartyPants-PHP (the story in the entry SmartyPants-PHP) – this implementation of SmartyPants can be used by the TextilePHP WordPress plugin.
Ports to other languages
js-markdown is a partial implementation of Markdown, written in JavaScript.
There are other ports in many languages.
It's not the end…
Continuation
Actually Michel Fortin made an interesting entry about styling languages, Prolifération des langages de style (in french), where he identified:
- SPIP « enrichissement typographique » (syntax available on the webpage Text formatting shortcuts for SPIP).
- Textile. You can find a colorful reference of Textile as A Textile Reference on the Hobix website – Hobix is a blogging package in Ruby; Hobix uses RedCloth to render Textile styling language (the Textile engine, in Ruby). It is to note that there exists a Python port of Textile: PyTextile.
- Markdown (+SmartyPants).
- the Wikipedia syntax (markup and help for editors).
- ReStructuredText (quickref here, quickstart here, and french quickstart here).
ReStructuredText is in fact an alternative to StructuredText being developed as a Python docstring standard (more on the Structured Text Wiki). StructuredText itself is an evolution of Setext.
Following the entry Markdown on The Tao of Mac, I discovered XhtmlForWiki. The guy behind this page plans to code a global markup converter, MarkupToMarkup.
- Wiki2xhtml (« Wiki2xhtml est une classe PHP servant à transformer du texte écrit avec une syntaxe Wiki en XHTML valide. »);
- DotClear (a wiki syntax);
- txt2tags syntax (“Txt2tags is a document generator. It reads a text file with minimal markup […] and converts it to […][many] formats:”);
- Radeox (“Radeox API is a lightwight wiki markup rendering engine API to make render engines for wikis more portable.”).
And I am sure to have forgotten a lot of other languages!…
Small text tools are far more widespread
Small tools are sometimes very useful. Matthew Mullenweg (creator of WordPress) developped a lot of them:
- New Lines to Paragraphs (“By far the most popular code in the site, and has been widely adapted in different projects. Basically it takes PHP's
nl2br
function to the logical next step and converts double line breaks to paragraphs where applicable, does line breaks as before, and best of all it's aware of block-level HTML tags so it won't mess up your page.”) - PHP Acronym Definer (“When you run your text through this code it will define all the acronyms it can using the
acronym
tag. It also has a few other niceities, so check it out.”) - Cardinal Endings for Numbers (“Adds cardinal endings to numbers, like 1st and 2nd, and doesn't do much else. Has an option or two that might make this useful for you.”)
- Curly Quotes Function in PHP (“The predecessor to the Texturize function. Soon to be totally depreciated. Has some neat regular expressions if you want to check it out though.”) and Curly Quotes for Movable Type (“Implementation of the curly quotes function in Perl using Brad Choate's regex plugin. As far as I know this was the first code of its type for Movable Type, though now there are some better alternatives for that system such as Smarty Pants and Textile. If there's interest I'll look into porting Texturize to Perl since it does a bit typographically than either of those systems.”) – more in the article Em and En Dashes in Movable Type on Photo Matt.
But he went a lot further developping a Textile-like tool: Texturize (“the first automagic quote ‘curlifier’”), used in WordPress (some information in the entry Texturize Finished).
That would also be interesting to have linguistics statistics, like those offered by many Perl modules, for example Lingua-EN-Syllable (“Routine for estimating syllable count in words”) or Lingua-EN-Fathom (“readability and general measurements of English text”).
To be continued…
This entry was mainly focused on styling languages and structural markups. Doing that way, I obfuscated interesting subjects like code beautifying (with syntax highlighter like Beautifier) referencings (purple numbers, citations, etc.), knowledge management with a wiki (“translating Wiki formatted text into other formats” with tools like Text::WikiFormat, managing a weblog with a wiki as described in WeblogWithWikiDiscussion, or the opposite, managing a wiki with a weblog engine, for example using MTWikiFormatPlugin), etc. A upcoming entries will deal about these domains, starting with an overview of the most appealing weblog plugins for my project (found on the Movable Type Plugin Directory or in the WordPress Wiki Plugins section).
Posted by Jean-Philippe on December 23, 2004 13 Comments, 6462 TrackBacks