From jkl@ccl.net  Sun Nov 24 23:11:30 1996
Received: from bedrock.ccl.net  for jkl@ccl.net
	by www.ccl.net (8.8.3/950822.1) id XAA07532; Sun, 24 Nov 1996 23:08:10 -0500 (EST)
Received: from krakow.ccl.net  for jkl@ccl.net
	by bedrock.ccl.net (8.8.3/950822.1) id XAA03792; Sun, 24 Nov 1996 23:08:09 -0500 (EST)
From: Jan Labanowski <jkl@ccl.net>
Received:  for jkl@ccl.net
	by krakow.ccl.net (8.6.10/920428.1525) id XAA10271; Sun, 24 Nov 1996 23:08:09 -0500
Date: Sun, 24 Nov 1996 23:08:09 -0500
Message-Id: <199611250408.XAA10271@krakow.ccl.net>
To: chemistry@www.ccl.net
Subject: CCL needs your input
Cc: jkl@ccl.net


Dear Netters,

The good news is that we have an additional full time person to work
on the list improvment and Web site. David Tinapple joins the CCL
staff on Monday, Nov. 25. He has substantial Web authoring experience
in commercial, as well as, non-profit sector. He participated in
crating several multimedia CD-ROMs, and we are told that he has a good
sense of knowing where simplicity meats elegance in the Web site develpment.
We already have great plans, and once he goes through the megabytes of
stuff in CCL archives, and gets over the new hire red tape, we will see
things happening faster.

We know it, and you know it that CCL discussions get sometimes too broad
for some subscribers. And beside, if we asked you what is computational
chemistry, we would get a lot of different answers. So for those of you
who are interested only in the subset of CCL messages on a given topic,
we want to provide an automatic filter.

How it will work?
=================
The current list stays as it is for those who want it this way. Everybody
sends messages to the normal address: chemistry@www.ccl.net. The text of
incoming message is analyzed by the software for contents, and tagged
according to the contents. It may happen that the message will belong to
several subjects. For example, the message dealing with: "The basis set
dependence of Force Field parameters for Molecular Mechanics derived with
Density Functional approaches" may belong to many subjects, e.g., Molecular
Mechanics, Density Functional Methods, Basis Sets, Quantum Chemistry.
The example also shows that subjects chosen need not be "orthogonal",
and can overlap a great deal, or even be embeded within other subject.

Once we are finished, we will provide a new signup form for the CCL
and you will be able to migrate from the "everything" CCL to a subject
oriented delivery (we may also provide other options like: the maximum
file size for the message, the "This subject but only if together with
that subject", and others). Of course you will be able to choose as many
subjects as you want. 


What we need from you?
======================
Vote!!! That is, tell us which subjects you would choose for yourself.
and which you do not want to see ever {:-)} ("Never say never again").
Fill in the form at:
      http://www.ccl.net/ccl/ccl-subjects.html
or simply send a message to jkl@ccl.net (Jan Labanowski) with subjects
which you would be personally interested in receiving, and which subjects
you would not like to see. We need your initial response to decide how many
subjects we want to implement. WE NEED YOUR MASSIVE RESPONSE to really
help us in making rational choices. So please, participate. BUT SEND
MESSAGES ABOUT IT TO jkl@ccl.net rather than the list, please. This survey
will only take you a minute or so to answer (unless you are my type, i.e.,
"paralysis of analysis"). But do not be pedantic, since we need general
guidance at this time. Save your deliberations for the moment when we will
ask you to make the actual choice.


How we will approach the problem?
=================================
While it looks simple, it is not. So if you know how such things are done
let us know. We could use the sophisticated stuff spooks use with FFT
and the like. But we are considering the following general approach. Each
subject will be defined as a series of keywords (or actually flexible
matches, called regular expressions in UNIX). These keywords will be
checked against the text of the message to qualify message as belonging
to a given subject. There also has to be some ranking mechanism, and
keywords may be assigned different weight ("importance"). The proximity
of other words can also be taken into account. Of course, one could use
a training set of past messages in CCL archives, assign subjects to them
and run neural net stuff on them. While we do not rule out this approach,
we will probably do it in a more deterministic way to be able to use
new buzzwords which appear every month to add to the keywords list.

After we decide on subjects, we will create a Web form, and you will
be asked again for help. This time to provide us with keywords typical
for a given subject. We will also go through archives and assign past
messages to subjects, split messages to words and rank the frequency of
words in a given subject. While we are sure that "and", "molecule", etc.
will be the leaders, we hope to find also some specific words which
occur within subjects.

Then we will do the software. The software has to be smart and especially
distribution has to be well optimized. We also need to think about messages
which are outside the scope of any field (did not rank high enough) -- we
may simply handle them by hand and improve our keyword lists. We will also
provide a way for the authors to assign their messages to the subjects,
but we do not want to enforce it. And beside, who knows what will show up
in the process of doing this. But we think that it may be a nice and needed
tool once we can make it work. If it works well enough, we may want to make
it available, so you can use it as a filter for your other mailing lists and
newsgroups. But do not be impatient as it will take some time to create it.

So please help us (and help yourself too, in the process).

Jan Labanowski
CCL Admin
jkl@ccl.net


