[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Solution three (notes, not actual description yet): distributed nntp



Definitions:
server: a machine that stores news
user: a person who wants to read news

Each machine that stores news is a server. We use the nntp protocol
(as outlined in http://src.doc.ic.ac.uk/computing/internet/rfc/rfc977.txt
-- in particular, look at section 1.6) to provide a fast reliable
streamed connection between the servers.

This means that each server has two "sections": the first section is
the nntp news spool, which keeps news in the standardized format they
we're going to agree on pretty soon (see
http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1036.txt for some
initial thoughts on what might want to go into that format). This section
interacts with other servers in the LNN (linux news network ;) to keep
everybody synchronized. I will go into more detail on this section in a
future message. Newsgroups can be equated to 'news categories', and we
have a hierarchy already set up for us. Note that while we're using nntp
programs, we won't actually be "connected" to the rest of usenet.

The second section is the user-oriented side. Each server can (optionally)
have scripts to parse his news spool into some sort of sql-based database
on that server, in whatever format he likes, and then feed a webpage with
that sql database. Alternatively, he can turn his news spool into a mailing
list, with or without moderation.

Section two will be the primary means of reading news for the user. Some
servers will provide news directly to the user, and some servers are just
around to be support backbone servers for LNN. The user-oriented servers
might want to have news expire from their spool pretty frequently, whereas
the non-user-oriented servers might archive all the news that goes through
them (similar to dejanews), meaning we could have good distributed news
archives, searchable via a variety of formats.

Another extension to this is that various servers might have a list of
clients who want a news feed but don't want to be part of the actual Linux
News Network itself. This is something that each server can decide on its
own; we don't have to deal with that (yet).

Advantages:

* This takes advantage of existing code. We have very little to write.
* This requires fewer machines, since the news providers also serve as
  news distributors behind the scenes.
* If multiple servers go down, the rest will still get by and perform
  as usual.

Disadvantages:

* Since some servers provide both news to users and news to servers,
  they will be doing more than they currently are. But I don't expect
  the nntp part of things to provide much load at all compared to users.

Conclusion:

This is a robust distributed model based on already-tested code. I
haven't worked out the details yet, but I believe this is the correct
path to go.

Comments are very much appreciated on any of these three potential
solutions. I'm sure I'm wrong somewhere. Please argue with me. :)