[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SEUL: Underlying tech/representation of the web page

> I was thinking about how we want to author our pages.  I think you
> all will agree that just writing everything in raw HTML is a bad
> idea.
> I don't have a lot of experience with this stuff.  I have used m4
> in the past, but while it worked okay I wouldn't say it's great or
> anything.
> I think we should really try to make something that is software-
> neutral.  m4 files depend on m4, php depends on php, dhtml depends
> on Zope, etc.  But the transformations that turn the source into
> formatted HTML isn't the hard part, it's making all the source
> (content).  I think we should make something that is neutral of
> the program(s) we use to transfer it to HTML.
> For that reason, I think XML could be a good idea.  Something like
> xtract <http://www.xmlscript.org> can do a transformation to HTML
> pretty well, and I'm sure many new programs will show up in the
> future.  With a good, fully semantic representation of the content
> it will be useful for a long time without heroic efforts, allow
> us to give it whatever look we want, including multiple looks in
> case we can't decide.  XML seems a good semantic representation to
> use.
> Well, those are my thoughts.  Of course, even if we settle on XML
> that's only the beginning, since we'd need to consider the what
> tags to use... but that's another issue.
Long ago, I wrote a program called 'sdoc' (the seul document parser)
that did this. Basically, you could define your own tags, and a little
perl parser would go through, parse the tags, and run arbitrary perl
code over the stuff inside the tags. It was fun, easy, simple to extend,
etc. We have this implemented on the seul site, such that if you commit
a file that ends in .sdoc, it will run sdoc over it and produce an html
file. For instance, look at pub/public_html/whatsnew/index.sdoc in the
cvs repository (you may have to checkout pub if you previously only
checked out edu). You can see the .html that it produces at
That only uses a couple of the sdoc tags...it's easy to make more.

It also turns out that sdoc can parse xml files. I guess I just designed
it well. :) Of course, sdoc doesn't require the headers from xml files,
so it's more flexible in that respect. On the other hand, the idea of
having different dtd's isn't supported, so in that respect it's less
flexible. (It can still be done, it just requires some perl hacking.)

Summary: if you can get an xml parser that is easy to use and simple,
then we should use it. It really depends how much power you need in
terms of expanding tags. I feel that xml is "just the beginning" of what
sdoc offers. But we should move away from sdoc because it's old and I'm
the one who would maintain it. :) (Or we should publish it, but I'd feel
bad doing that without first making it actually xml compliant.)

Elseif, we should use sdoc.

Urls of interest: