[IRCServices] Closer looks at services databases?

Sat Oct 14 13:13:39 PDT 2000

Hi Andrew,

Just for interest sakes (I know we have talked about this *MUCH* in the
past), but just what exactly is ment by services "reading once, and writing
many".  I totally like do not understand that interpretation of what exactly
services is doing, and howcome this is the case in the scenario of a "data
update" or "data retrieval".

If a nickname is registered, read not at all, write many
If a nick is changed, read once, write perhaps more than which is required?
If a memo is read, read ALLOT, write one bit?
etc.

Shouldn't there perhaps rather be looked into the ammount of information
services writes to the databases?  I admit yes, I have not looked at the
code seeing that I don't have it at the moment, but from what I remember
back the last time I looked at it, yes, the database code was a mess, and
because of that, I don't understand it.  Which brings me back to the
question of what exactly does services do with its database?

A quick example (I am talking under speculation here - as I said before, I
did not look at the code, and I wont bother looking at the code, because its
to compicated for a avarage programmer which I hope I can be seen as).

Say we have a nick registered in the database (moo).  moo registers his
nick, and this time, the entire nickserv structure is copied into the
database for the nickname.  However, now, user moo does a update.  Lets say,
/msg nickserv set email blah at blah.com

My question is this:  How exactly does services process this update?
Surely, if you use the nickname, like in this case moo as a reference, can
it be that severe on large networks if you just update a string in the
database?  Sure, I can understand that if you delete the record entirely,
and regenerate it for the update, that it is a long process, but updates
should NOT be a complicated procedure.

In the past, I made (I think), quite a big effort on getting services to
support SQL based databases, and it is exactly for this reason.  As far as
physical searches (locating of the user's registration details) through the
database goes, they are 1) accellerated, and 2) they are cached.  This is
what makes SQL databases good for high-information stores (allot of
records).

Secondly, the records can also be automatically assigned registration ID.
Which, normally is ALLOT smaller than the actual nick (less bits in size,
the data travels faster through the wires).  A simple scenario ( I realise I
am missing settings here, this is only as example), a database structure on
SQL may be:

NickID (int)
NickName (varchar)
EMailAddress (varchar)
SetKillImmed (bit) !!!!!!
SetKillQuick  (bit) !!!!!!
SetKillWhatever (bit).

The basic idea behind this, is to use the smallest possible size of data in
the db.  The bits, is either a 1 or a 0, and because the NickID is a
integer, it is also smaller than the NickName which is varchar.

Microsoft's SQL Server uses the following sizes for its tables: (This varies
from every type of Database, like MS-SQL, ProgreSQL, MySQL, DB1, DB2, DB3,
etc) These are the default field lengths for data to be stored as native
file storage type (nullable data is the same length as nonnull data, and
character data is always stored in character format).

Data type            Default length (characters)
bit                      1
binary                 Length defined for the column
varbinary            Length defined for the column
image                 0
datetime             8
smalldatetime     4
float                   8
real                    4
int                      4
bigint                 8
smallint              2
tinyint                1
money               8
smallmoney        4
decimal             *See footnote
numeric             *See footnote
uniqueidentifier  16
timestamp          8

* Numeric data types with fixed precision and scale.
Precision                  Storage bytes
1 - 9                        5
10-19                      9
20-28                      13
29-38                      17

The point at which I am getting, is that services is allocating allot of
bytes in its tables to chars which is not needed, and therefor, also needs
to update more than it needs to, aswell as send more information over the
IRC network than what is needed.

Secodly, by utilising a identifier (such as NickID set in my example), your
updates, sending of memos, and various other procedures related to a
nickname (for example), is ALLOT faster.

Say, I issue UPDATE table SET SetKillImmed='1'  WHERE NickID='12' (The user
does a /msg nickserv set kill immed).  This is 50 (If I counted correctly),
characters that services needs to send to the database.  The error checking
is a simple matter of checking if return value is success or failure (7
characters), so in total, the entire update procedure, will send and receive
57 characters.  Can this be said for the DBs currectly in use?

Furthermore, with regards to SQL's caching abilities.

I have a 166MMX, 80MB Ram, Running Windows NT 4.0 atm.  In idle state, my PC
uses:  CPU Load: 15%, Memory in Use (Pagefile): 138MB (Peaked at 150MB)

I also run MS SQL 2000, and my example is in a table I use to archive the
BugTraq Mailing list.  The table looks as follows (to give you an idea of
the sizes):
 ID (int) IDENTITY (1, 1) NOT NULL ,
 MessageID (nvarchar(150)) NOT NULL ,
 InReplyTo  (nvarchar(150)) NULL ,
 MsgDate  (nvarchar(40)) NOT NULL ,
 MsgFrom (ntext) NOT NULL ,
 MsgSubject (ntext) NULL ,
 MsgBody (ntext) NULL

For a indication of the specific field sizes:
int: (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1
(2,147,483,647).
nvarchar: Variable-length Unicode data with a maximum length of 4,000
characters.
ntext: Variable-length Unicode data with a maximum length of 2^30 - 1
(1,073,741,823) characters.

As you can realise from this, one record in the table, allocates quite allot
of data. BTW: the table currectly holds 1523 records.  Now, let us do some
statistics:

First, a lookup without utilising the cached index on the table, the command
executed:
SELECT * FROM BugTraq WHERE ID='67'
Event Class:               Duration:  CPU:  Reads:  Writes:
SQL:StmtCompleted 140          10      0            0

Now, the same lookup BUT, I am using the cached index:
SELECT * FROM BugTraq WITH (INDEX(IX_BugTraq)) WHERE ID='67'
Event Class:               Duration:  CPU:  Reads:  Writes:
SQL:StmtCompleted 70            10       0           0

As you can see, the load stays the same, BUT, the duration (aka speed) of
the query droped by 50%, thus, giving better performance. Now, let us
update?
UPDATE BugTraq SET InReplyTo='something' WHERE ID='67'
Event Class                Duration: CPU: Reads: Writes:
SQL:StmtCompleted 231          30     0          8

Note, writes 8, reads = 0!!

In other words, by using something like SQL, you will once more, SAVE on
performance and speed, because of the fact that the location of the record
in the database does NOT need to be located...

As a closing, I would just like to ask if someone DID in fact do detailed
analysis of what exactly is happening in the services database?  As I've
shown you here, a SQL based database CAN and WILL increase the services
performance, aswell as to LOWER the actual ammount of information that is
needed to be transmitted to and from the services to the IRC Network.  Can
the same results be given for the dbs services are currently using, and if
so, can I be proven wrong that a PROPERLY IMPLICATED SQL based Services
database, will NOT enhance IRC Services?

This read once / write many times in my books, is a pile of horse ....  It
is of no relavence whether there is being read or written to the database.
The question is about WHAT is written / read, and WHERE in the database is
it being sent to, or read from, and if services actually KNOW where the
information is that needs to be updated.

Thus, my closing questions are:
1) Can someone provide me with the structures of all the dbs so that I can
properly port it to SQL.
2) Can there be added SQL support in a non-public version of Services for
speed analysis
3) How and where does services read / write / update information in its
database
4) How does Services keep record of where in its databases information is
stored.
5) Does anyone have specs (detailed) about the speed of the services
database at the current moment?  (Note, I'm talking about accurate speed
analysis of the databases IN USE by services, accessing the information the
SAME way services does.)  Surely, if you are looking at the database, you
CANT tell me SQL is not going to enhance performance if you DONT have
information backing up your accusations...

If for some reason you still believe I am wrong about SQL Andrew, let us
test this, and compare our results?  I'll bet you R100 :P  SQL is
multithread aswell, which adds ALLOT more speed into services by allowing it
to make more than one query at a time (for starters).

---
Regards,
Chris Knipe
Cell: (083) 430-8151

---------------------------------------------------------------
To unsubscribe, send email to majordomo at ender.shadowfire.org
with "unsubscribe ircservices" in the body, without the quotes.