Cartesian to PDB Conversion - A Summary
Dear Netters,
Below is a list of the solutions I have received in my quest for a
routine to convert cartesian coordinates into SYBYL readable
PDB format. This list does not include Jan's posting of the
fortran executables he has made available via anonymous ftp. This
list also does not include the many generous offers of help in
trying to solve this problem. Again thanks to one and all who
responded to my query!
-mark z.
1)
Mark,
The way I have been doing this type of transformations is as
follows:
-with the XYZ file, do a run with Mopac using the 0SCF
keyword. If your version of Mopac (I assume you have
one) is 6.0, the output (*.out file) should give you the
system's coordinates in both XYZ and INTERNAL forms.
-from the output, copy the block that contains the system
in internal coordinates (last block) into a file that must
contain the upper part (everything, but the coordinates)
of an *.arc file from Mopac. This new file must be named
*.sta to be read by Sybyl. Also, in the just created *.sta
file, the 3rd non-blank line (line containing the total
number of atoms of every type) must be modified accordingly.
Obviously, the starting *.arc file (if you don't have one)
can be easily created by doing a Mopac run (in the order of
seconds) of a simple system.
NOTES:
-In my version of sybyl I have to change also the second
non-blank line (version line) of the just created *.sta file to
read "VERSION 5.00"; and I have to edit the coordinate
columns
to look exactly like the example below. These last operations
can be readily carried out with a simple shell script (perhaps
in "nawk" UNIX shell language).
-------------------------------CUT HERE---------------------------------
SUMMARY OF AM1 CALCULATION
VERSION 5.00
AM1 BONDS EF
/tmp_mnt/home/eby/rafael/sybyl/pbztPolymopac2.dat
DEFINING A DUMMY ATOM (WHICH COINCIDES WITH THE Tv)
C 0.0000000 0 0.000000 0 0.000000 0 0 0 0
C 1.4023182 1 0.000000 0 0.000000 0 1 0 0
C 1.3925309 1 120.491775 1 0.000000 0 2 1 0
C 1.4018573 1 120.013080 1 0.011448 1 3 2 1
C 1.4022088 1 119.501337 1 -0.006766 1 4 3 2
C 1.4017758 1 119.481924 1 -0.008443 1 1 2 3
H 1.1032095 1 120.192216 1 -179.994083 1 2 1 3
H 1.1019895 1 119.576861 1 -179.990156 1 3 2 1
H 1.1031646 1 120.204522 1 -179.998951 1 5 4 3
H 1.1019215 1 120.416823 1 -179.997841 1 6 1 2
C 1.4599093 1 121.035896 1 179.993477 1 4 3 2
N 1.3231421 1 125.404264 1 0.033392 1 11 4 3
S 1.7494256 1 119.181316 1 -179.962314 1 11 4 3
C 1.4068053 1 109.848965 1 -179.998738 1 12 11 4
C 1.4014290 1 125.023003 1 -179.995792 1 14 12 11
C 1.3872028 1 118.164477 1 -179.999537 1 15 14 12
C 1.4432826 1 121.214175 1 0.004160 1 16 15 14
C 1.4014243 1 120.626432 1 -0.003837 1 17 16 15
C 1.4432322 1 114.366623 1 0.004121 1 14 12 11
H 1.1009352 1 120.485763 1 0.002677 1 15 14 12
H 1.1008893 1 120.515321 1 -179.999542 1 18 17 16
S 1.6921053 1 129.287470 1 -179.997306 1 16 15 14
N 1.4067819 1 114.372176 1 179.996517 1 17 16 15
C 1.3231197 1 109.830044 1 0.002209 1 23 17 16
XX 1.5088405 1 125.461003 1 179.998250 1 24 23 17
Tv 12.5597124 1 0.000000 0 0.000000 0 1 25 22
0 0.0000000 0 0.000000 0 0.000000 0 0 0 0
-------------------------------CUT HERE------------------------------------
Hope this helps !!
Con Saludos,
Rafael G. Ramirez
------------------------------------------------------------------
email : rafael' at \`eby.polymer.uakron.edu phone: (216) 972-5810
usmail: Institute of Polymer Science FAX : (216) 972-5290
The University of Akron,
Akron, OH 44325-3909
U. S. A.
2)
#! /bin/sh
awk '{printf "%s %6s %s %-3s %2s %5s %11.3f %7.3f %7.3f %s %s\n",
"ATOM", NR, "", $1, "RES", "1", $2, $3,
$4, " 1.00", " 0.00"
}' $1
The above shell script should convert simple cartesians into pdb
format. The first $1 refers to column 1 of the input file which should
contain the atom name. $2,$3,$4 refer to the cartesian coordinates, in
this case in columns 2,3 and 4 of the input file. These can of course be
changed if the atom name and coordinates are in different columns. The
second $1 refers to the input file. Just put the above into a file, call
it "con", make it executable and type con input_file > output_file.
This should give a format readable as "pdb" format.
Cheers
Nick Tomkinson
chs1nt' at \`surrey.ac.uk
3)
C
C GAUPDB.FOR This program transforms gaussian cartesian format
C into PDB files. Using PDB format, you can read
C your coordinates into SYBYL. Program was written
C and compiled on VAX under VMS. Input files should
C be trimmed out of your Gaussian output and given
C the extension .XYZ. Output files will have the extension
C .PDB. See comments below!! Questions about the code:
C
C Dr. Rick Gussio
C NCI-Biomedical Supercomputing Center
C P.O. Box B, Bldg. 430
C Frederick, MD 21202
C
C Email: gussio' at \`ncifcrf.gov
C
CHARACTER FILENAME*35,OUTFILE*35,TITLE*80,WLINE*80,ATOM*4
CHARACTER TYPE*4,RES*3,TRSH1*1,TRSH2*2,TRSH3*3,TRSH4*4,SEGID*4
CHARACTER FI*35
REAL X,Y,Z,W,PAF
INTEGER ATOMNO,TYPENO,RESNO,TOTATO,I
C
C line headings
C
ATOM='ATOM'
PAF=0
C
C formats for total line read
C
10 FORMAT(A4)
15 FORMAT(I5)
20 FORMAT(A80)
C
C a few formats
C
30 FORMAT(1X,I4,7X,I4,7X,3F12.6)
40 FORMAT(A4,I7,2X,A3,1X,A3,3X,I3,4X,3F8.4)
C
C prompt user for filename
C
WRITE(6,*)' '
WRITE(6,*)' Please ENTER the Filename WITHOUT the Extension : '
READ(6,43) FILENAME
WRITE(6,43) FILENAME
43 FORMAT(A35)
C
C remove trailing spaces
C
I=LAST(FILENAME,35)
C
C Find cartesian coordinates in the gaussian output file: eg.
C
C 1 8 -3.080796 -0.357418 -0.065404
C 2 1 -4.063259 -0.115795 -0.258498
C 3 1 -2.940725 -0.680416 0.902559
C 4 7 -0.308698 0.805764 -0.011805
C 5 8 0.966751 1.594701 0.016611
C 6 7 1.058242 -1.403593 -0.026561
C 7 8 2.333691 -0.614656 0.001856
C
C Create a file, the file name should have
C the extension .XYZ
C
OPEN(UNIT=1,FILE=FILENAME//'.XYZ',STATUS='OLD')
C
C output file will have .PDB extension
C
OUTFILE=FILENAME(1:I)//'.PDB'
OPEN(UNIT=2,FILE=OUTFILE,STATUS='NEW',FORM='FORMATTED',
+ ACCESS='SEQUENTIAL',CARRIAGECONTROL='LIST')
C
C
C read title lines
C file filter
C
80 CONTINUE
C
READ(1,20,END=1000) WLINE
READ(WLINE,15) ATOMNO
READ(WLINE,30) ATOMNO,TYPENO,X,Y,Z
WRITE(6,30) ATOMNO,TYPENO,X,Y,Z
RESNO = 1
TRSH1 = 'G'
RES = 'GAUS'
TRSH= 'Z'
IF( TYPENO .EQ. 1 ) TYPE= 'H '
IF( TYPENO .EQ. 6 ) TYPE= 'C '
IF( TYPENO .EQ. 7 ) TYPE= 'N '
IF( TYPENO .EQ. 8 ) TYPE= 'O '
IF( TYPENO .EQ. 15) TYPE= 'P '
IF( TYPENO .EQ. 16) TYPE= 'S '
IF( TYPENO .EQ. 17) TYPE= 'CL '
IF( TYPENO .LE. 0 ) TYPE= 'X '
TRSH3= 'G'
SEQID='GAUS'
TRSH4='G'
RESID='GAUS'
W=0.000
C
C write pdb file
C
WRITE(2,40) ATOM, ATOMNO,TYPE,RES,RESNO,X,Y,Z
GO TO 80
C
C close files
C
99 FORMAT(A3)
1000 WRITE(2,99) 'TER'
CLOSE(UNIT=2)
CLOSE(UNIT=1)
STOP
END
C
C Appends extensions to filenames:
C this function finds the last non blank character
C
FUNCTION LAST(TEXT,N)
CHARACTER TEXT*(*)
DO 1 I=N,1,-1
1 IF(TEXT(I:I) .NE.' ') GO TO 2
I=1
2 LAST=I
RETURN
END
4)
SYBYL has an interface that was originally written for the GAUSSIAN 86
program, but I believe it will also write and read files for newer versions
of GAUSSIAN. From the command line, use the SYBYL command:
SYBYL> GAUSS86 <molecule area> RETRIEVE <fileset name> GEOMETRY
This command assumes a copy of the molecule exists in the molecule area
you entered and will update the x,y,z coordinates of the moelcule using
the coordinates in the GAUSSIAN output file. Thus if you can somehow
get a copy of your molecule (with any geometry, it doesn't matter how
poor, it's only important that the atom numbering be the same as that
used in the GAUSSIAN calculation) into SYBYL, this may be a way for
you to read the GAUSSIAN structure into SYBYL.
Note that the GAUSS86 RETRIEVE command will make no changes to
connectivities or atom types. It simply modifies the x,y,z coordinates
of the atoms.
One quick and dirty way to make a starting structure in SYBYL of your
molecule that you could use with the GAUSS86 RETRIEVE command would be
to use the SYBYL command:
SYBYL> ADD RAWATOM M1 <atom name> <atom type> 0 0 0
for each atom in the molecule. This will place all the atoms on top
of each other at 0,0,0. This is OK though, because when you then use
the GAUSS86 M1 RETRIEVE command, it will place the atoms at their correct
x,y,z positions. To quickly generate the bonds, use the SYBYL command
SYBYL> CRYSIN M1 CONNECT * * NO_SYMMETRY_SEARCH BOND_LENGTH_TABLE
after you have used the GAUSS86 command. This will automatically create
bonds based on distances between atoms.
I hope this helps you with your problem. If not, let me know and we
can probably put together a little SPL script that will help you.
Regards,
Vic Lewchenko
Tripos Associates, Inc.
St.Louis, MO
victor' at \`tripos.com
5)
I had the same need for a conversion program to Sybyl before we
got G92. If the newzmat utility doesn't work out for you (I haven't tried
it), I'm sending you a simple fortran program written for the vax that worked
for me. It converts cart coord to a pseudo-pdb format that SYBYL will read.
Two things you may have to change around: the fortran format definitions to
suit your needs, and atom types once the molecule is in SYBYL (no big deal).
Along with the short program, I'm sending an example .com file you can use
to mimic the format as well as the output you can try in SYBYL. Let me know
if you have any questions or if you don't receive all the files.
Happy Holidays
the conversion program pdbfor.for:
C Program to Convert Cartesian Coords to Sybyl readable PDB format
DIMENSION X1(5000), Y(5000), Z(5000)
INTEGER I,J,N,X
REAL X1,Y,Z
CHARACTER*80 NAME(5000), FNAME, JUNK(5000)
I = 1
X = 0
DO 40 N = 1, 5000
READ(5,25,ERR=45) JUNK(N)
25 FORMAT(A80)
IF (JUNK(N)(20:30) .EQ. ' ') GOTO 40
READ(JUNK(N),30,ERR=45) NAME(N), X1(N), Y(N), Z(N)
30 FORMAT(A12,3F9.6)
X = X + 1
40 CONTINUE
45 DO 50 N=1, X
IF (Y(N) .EQ. 0.0000) GOTO 50
WRITE(6,99) ' ATOM',N,NAME(N)(1:4),'R01','1',X1(N),Y(N),Z(N)
50 CONTINUE
99 FORMAT(A5,4X,I3,1X,A4,1X,A3,5X,A1,5X,F7.3,1X,F7.3,1X,F7.3)
END
a sample .com file:
$ mat
$ assign bac10.out sys$output
$ run pdbfor
C1 9.69488 2.44667 -0.33565
C10 9.75925 -0.54929 -2.39149
C11 10.42360 0.69546 -1.92134
C12 11.67762 0.68335 -1.44625
C13 12.20292 1.88993 -0.73423
C14 11.11370 2.53232 0.13327
C15 9.59960 1.99327 -1.83249
C16 8.15503 1.88341 -2.24958
C17 10.22275 3.12909 -2.72714
C18 12.62008 -0.46922 -1.61531
C19 6.79800 -0.80531 0.14685
C2 8.81938 1.51381 0.56641
C20 8.11897 -0.10148 3.12819
C21 8.17820 3.44656 3.62179
C22 9.35497 3.25012 4.31283
C23 9.55583 3.82641 5.46785
C24 8.56960 4.52187 6.11817
C25 7.36193 4.72855 5.49377
C26 7.15077 4.18950 4.20918
C27 7.89495 2.85724 2.32609
C28 11.62870 -0.59770 2.41494
C29 12.81577 0.02700 3.11955
C3 9.20562 -0.01024 0.67253
C30 9.86225 -1.37695 -4.62256
C31 9.40133 -1.04179 -6.04783
C4 9.28030 -0.45526 2.11508
C5 8.95327 -1.94486 2.55438
C6 8.41510 -2.90193 1.53510
C7 8.68032 -2.47646 0.14314
C8 8.31210 -0.96358 -0.16042
C9 8.45115 -0.91890 -1.70539
H10 10.40815 -1.27361 -1.85717
H13 12.52995 2.71200 -1.46352
H14 11.34803 3.49404 0.20608
H141 10.96950 2.09754 0.97116
H16 8.13443 1.40767 -3.08006
H161 7.61942 1.29595 -1.53756
H162 7.90267 2.91589 -2.40136
H17 11.34545 3.30505 -2.50872
H171 9.38845 3.38884 -2.90360
H172 10.34120 2.57887 -3.74272
H18 13.56767 0.01583 -1.94725
H181 11.84242 -0.72804 -2.31745
H182 13.15568 -0.46736 -0.86133
H191 6.26498 -1.47843 -0.50224
H192 6.47098 -0.03538 -0.28012
H2 7.93357 1.63204 0.29739
H20 7.53187 0.33144 2.62719
H201 8.25287 0.53626 3.71311
H22 10.11202 2.91775 3.70447
H25 6.57912 5.31601 5.95529
H26 6.26240 4.24443 3.63907
H27 10.34377 -2.16737 0.52075
H3 10.02448 -0.01676 0.29616
H5 9.62792 -2.31260 3.15657
H6 7.59368 -2.88796 1.83619
H61 8.62367 -3.69793 1.95959
H7 7.97478 -3.01923 -0.31714
O1 9.04597 3.72214 -0.17646
O10 9.44510 -0.39009 -3.78344
O100 10.45965 -2.33960 -4.29185
O13 13.29215 1.56781 0.10489
O2 8.93267 2.10406 1.85964
O20 6.85465 2.98106 1.71403
O4 10.51373 0.03259 2.73084
O40 11.70337 -1.53429 1.66220
O5 7.92843 -1.46819 3.47865
O7 10.04508 -2.73621 -0.22829
O9 7.55763 -1.35740 -2.35447
the output from the sample:
ATOM 1 C1 R01 1 9.695 2.447 -0.336
ATOM 2 C10 R01 1 9.759 -0.549 -2.391
ATOM 3 C11 R01 1 10.424 0.695 -1.921
ATOM 4 C12 R01 1 11.678 0.683 -1.446
ATOM 5 C13 R01 1 12.203 1.890 -0.734
ATOM 6 C14 R01 1 11.114 2.532 0.133
ATOM 7 C15 R01 1 9.600 1.993 -1.832
ATOM 8 C16 R01 1 8.155 1.883 -2.250
ATOM 9 C17 R01 1 10.223 3.129 -2.727
ATOM 10 C18 R01 1 12.620 -0.469 -1.615
ATOM 11 C19 R01 1 6.798 -0.805 0.147
ATOM 12 C2 R01 1 8.819 1.514 0.566
ATOM 13 C20 R01 1 8.119 -0.101 3.128
ATOM 14 C21 R01 1 8.178 3.447 3.622
ATOM 15 C22 R01 1 9.355 3.250 4.313
ATOM 16 C23 R01 1 9.556 3.826 5.468
ATOM 17 C24 R01 1 8.570 4.522 6.118
ATOM 18 C25 R01 1 7.362 4.729 5.494
ATOM 19 C26 R01 1 7.151 4.189 4.209
ATOM 20 C27 R01 1 7.895 2.857 2.326
ATOM 21 C28 R01 1 11.629 -0.598 2.415
ATOM 22 C29 R01 1 12.816 0.027 3.119
ATOM 23 C3 R01 1 9.206 -0.010 0.673
ATOM 24 C30 R01 1 9.862 -1.377 -4.622
ATOM 25 C31 R01 1 9.401 -1.042 -6.048
ATOM 26 C4 R01 1 9.280 -0.455 2.115
ATOM 27 C5 R01 1 8.953 -1.945 2.554
ATOM 28 C6 R01 1 8.415 -2.902 1.535
ATOM 29 C7 R01 1 8.680 -2.476 0.143
ATOM 30 C8 R01 1 8.312 -0.964 -0.160
ATOM 31 C9 R01 1 8.451 -0.919 -1.705
ATOM 32 H10 R01 1 10.408 -1.274 -1.857
ATOM 33 H13 R01 1 12.530 2.712 -1.464
ATOM 34 H14 R01 1 11.348 3.494 0.206
ATOM 35 H141 R01 1 10.969 2.098 0.971
ATOM 36 H16 R01 1 8.134 1.408 -3.080
ATOM 37 H161 R01 1 7.619 1.296 -1.538
ATOM 38 H162 R01 1 7.903 2.916 -2.401
ATOM 39 H17 R01 1 11.345 3.305 -2.509
ATOM 40 H171 R01 1 9.388 3.389 -2.904
ATOM 41 H172 R01 1 10.341 2.579 -3.743
ATOM 42 H18 R01 1 13.568 0.016 -1.947
ATOM 43 H181 R01 1 11.842 -0.728 -2.317
ATOM 44 H182 R01 1 13.156 -0.467 -0.861
ATOM 45 H191 R01 1 6.265 -1.478 -0.502
ATOM 46 H192 R01 1 6.471 -0.035 -0.280
ATOM 47 H2 R01 1 7.934 1.632 0.297
ATOM 48 H20 R01 1 7.532 0.331 2.627
ATOM 49 H201 R01 1 8.253 0.536 3.713
ATOM 50 H22 R01 1 10.112 2.918 3.704
ATOM 51 H25 R01 1 6.579 5.316 5.955
ATOM 52 H26 R01 1 6.262 4.244 3.639
ATOM 53 H27 R01 1 10.344 -2.167 0.521
ATOM 54 H3 R01 1 10.024 -0.017 0.296
ATOM 55 H5 R01 1 9.628 -2.313 3.157
ATOM 56 H6 R01 1 7.594 -2.888 1.836
ATOM 57 H61 R01 1 8.624 -3.698 1.959
ATOM 58 H7 R01 1 7.975 -3.019 -0.317
ATOM 59 O1 R01 1 9.046 3.722 -0.176
ATOM 60 O10 R01 1 9.445 -0.390 -3.783
ATOM 61 O100 R01 1 10.460 -2.340 -4.292
ATOM 62 O13 R01 1 13.292 1.568 0.105
ATOM 63 O2 R01 1 8.933 2.104 1.860
ATOM 64 O20 R01 1 6.855 2.981 1.714
ATOM 65 O4 R01 1 10.514 0.033 2.731
ATOM 66 O40 R01 1 11.703 -1.534 1.662
ATOM 67 O5 R01 1 7.928 -1.468 3.479
ATOM 68 O7 R01 1 10.045 -2.736 -0.228
ATOM 69 O9 R01 1 7.558 -1.357 -2.354
Hope this helps!
Jeanne Bundens
Bryn Mawr College
jbundens' at \`cc.brynmawr.edu