[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Data format



I've looked through the proposal that's on the linuxunited web site and
I have a couple of ideas.

Specifically, a standard exists that will allow you to describe the
content of articles in a very meaningful way: XML.  This would offer a
great deal more flexibility than the current system.  The current system
as I've read it uses a header to describe it's contents, something like
this:

[begin_header]
[submitter] Christopher Blizzard
[approved] Someone Else
[title] Linux is pretty pretty cool
[catagory] propaganda

This would be better expressed in XML:

<ARTICLE>
<SUBMIT>
<NAME>Christopher Blizzard</NAME>
<EMAIL>blizzard@appliedtheory.com</EMAIL>
</SUBMIT>
<APPROVED>
<SITE>
<URL SITE="http://www.lwn.net">Linux Weekly News</URL>
</SITE>
<PERSON>
Christopher Blizzard
</PERSON>
</APPROVED>
<CATAGORY>
Propagande
</CATAGORY>
<ARTICLE-BODY>
Today <URL site="http://www.linux.org">Linux</URL> creator <EMAIL
"torvalds@transmeta.com">Linus Torvalds</EMAIL> stated that he thought
that <QUOTE>"Linux is pretty cool."</QUOTE>
</ARTICLE-BODY>
</ARTICLE>

This may seem a little verbose but it does a really good job of
describing the data involved.  Also, it doesn't dictate the format in
which the data is stored in the backend.  It can be stored as a news
spool file, in a relational database backend or flat files.

More importantly, it does more to describe the data in the article.  A
couple of examples:

Many news sites have a "related sites" section or "related people"
section.  This could be automatically generated from anything that has
the <URL> tag around them.  Any distribution format that doesn't support
markup, email or news for example, could easily just strip out the tags
and leave the textual data in place.

Also, this allows you to apply styles ala style sheets to the format of
the data.  In the example above for a web page you could automatically
put an <I> tag around anything that has the <QUOTE> tag around it, or
generate one of those big "QUOTE"'es on the side of an article of
something that's particularly revealing.  A lot of sites do this too.

Because of XML's extensibility as well you can always apply your own
tags and styles to the data without breaking anyone elses sites either. 
That means that if you want to add additional markup to some piece of
arbatrary text, you can.

Additionally, this system allows your current system of storage to
remain intact.  I personally use a mysql database for storing articles. 
Since relational databases tend to put very strict requirements on data,
generating XML from them is cake.

I'll generate an XML DTD within the next couple of hours.

--Chris

-- 

------------
Christopher Blizzard
http://odin.appliedtheory.com/
------------