[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Solution two: single sql database with mirrors



There is a single huge well-connected server with a fast sql-based
database that stores all of the news articles, indexed by date, category,
etc. There are several mirrors around the world that are kept coherent
by the main server (whenever it gets a new article, it floods it to the
mirrors -- the fact that not all the mirrors contain exactly the same
database at any given time is ok, since they'll only be missing the most
recent news, and clients can do a reload if they want more recent news).
News will queue on a per-mirror basis when one of the mirrors is down, so
no complex negotiations need to be done to resync the mirror when it comes
back online.

Definitions:
master: the main server
mirror: alternate servers who try to keep the same db
server: a machine that's either the master or a mirror
user: a person who wants to read news
client: sites that distribute news to users

Each news article is given a nonce (a timestamp from the master) so clients
can keep track of the order of news in the database. When clients connect,
they provide the timestamp of the last news article they got, and the server
gives them newer articles.

Clients have a list of available servers, possibly ordered based on
geographical location to increase performance. When a client wants to
connect, it runs down its list of available servers until it successfully
connects to one.

Alternative: the dns server for linuxnews.org is rigged such that it returns
multiple server names. This has the advantage of not having to tell the
clients if a server is added or removed. On the other hand, I haven't found a
good way for the clients to know when there are no servers up (they never know
if they've finished going through the list, if they don't have a list).

It would be very nice to not have to keep a server list on each client.

Submissions are given to the clients, and they pass them on to the master.
If the master is down, either
* They spool them locally until the master comes up. This is a pain for the
  client.
* They send them to a mirror, and that mirrors spools them for the master.
  But what if the mirror goes down with something in its spool?

Advantages:

* This is very easy to implement, compared to more distributed models.
* This keeps the load off client machines like freshmeat and slashdot,
  who should be spending their load on user requests rather than client
  requests.
* If the master goes down, people can still retrieve relatively recent news
  from the mirrors.

Disadvantages:

* This relies a great deal on the master being stable and fast. 
* It does not scale well, when we have hundreds of clients submitting. The
  master has to get faster and faster.
* If the master goes down, no new articles will appear until it's
  back up.
* Since the servers are different machines from the clients, we need more
  machines.

Conclusion:

This is a three-tiered model, and it introduces much more complexity than
the simple client/server model. It requires twice as much new scripting to
function properly, and it's much easier to break. While it distributes the
load onto more machines, it doesn't actually make things much more stable.