GENSCAN 1.0 Date run: 14-Mar-103 Time: 07:35:42 Sequence hg13_dna : 242888 bp : 38.87% C+G : Isochore 1 ( 0 - 43 C+G%) Parameter matrix: HumanIso.smat Predicted genes/exons: Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr.. ----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------ 1.07 PlyA - 990 985 6 1.05 1.06 Term - 8654 8583 72 2 0 66 41 132 0.371 3.23 1.05 Intr - 16651 16628 24 1 0 81 105 42 0.666 2.50 1.04 Intr - 18652 18508 145 0 1 91 44 43 0.483 -0.44 1.03 Intr - 20407 20278 130 2 1 84 61 47 0.736 0.43 1.02 Intr - 22532 22488 45 0 0 73 106 73 0.315 5.06 1.01 Init - 29188 28792 397 1 1 65 64 252 0.884 16.14 1.00 Prom - 34098 34059 40 -1.95 2.07 PlyA - 34143 34138 6 1.05 2.06 Term - 39999 39839 161 1 2 109 35 76 0.789 1.52 2.05 Intr - 42468 42334 135 1 0 83 71 86 0.494 6.02 2.04 Intr - 52404 52290 115 0 1 46 52 88 0.166 0.10 2.03 Intr - 53055 53014 42 0 0 75 98 38 0.151 1.02 2.02 Intr - 63941 63847 95 0 2 65 27 114 0.317 1.76 2.01 Init - 65131 65083 49 1 1 67 108 9 0.647 1.96 2.00 Prom - 66240 66201 40 -4.85 3.00 Prom + 75157 75196 40 -3.45 3.01 Init + 78144 78202 59 2 2 92 84 23 0.297 3.15 3.02 Intr + 80695 80875 181 1 1 76 37 160 0.518 8.65 3.03 Term + 89629 89739 111 0 0 89 41 77 0.440 0.68 3.04 PlyA + 91017 91022 6 1.05 4.02 PlyA - 91318 91313 6 1.05 4.01 Sngl - 100662 98971 1692 0 0 83 38 600 0.968 49.32 4.00 Prom - 108203 108164 40 -3.65 5.03 PlyA - 108486 108481 6 -0.45 5.02 Term - 109619 108964 656 0 2 35 38 270 0.651 9.87 5.01 Init - 110473 110308 166 1 1 70 72 73 0.817 3.75 5.00 Prom - 114244 114205 40 -3.65 6.02 PlyA - 115008 115003 6 1.05 6.01 Sngl - 115531 115052 480 1 0 60 48 164 0.503 5.43 6.00 Prom - 116397 116358 40 -6.15 7.04 PlyA - 116986 116981 6 1.05 7.03 Term - 119698 119541 158 2 2 71 43 154 0.627 6.31 7.02 Intr - 123937 123856 82 1 1 83 78 47 0.401 1.49 7.01 Init - 137065 136823 243 1 0 57 87 114 0.442 5.98 7.00 Prom - 140378 140339 40 -5.75 8.00 Prom + 141037 141076 40 -5.95 8.01 Init + 149470 149475 6 0 0 58 92 0 0.261 -1.67 8.02 Intr + 153372 155858 2487 2 0 111 96 1876 0.868 177.08 8.03 Intr + 156153 156240 88 2 1 85 32 36 0.385 -3.68 8.04 Intr + 157438 157550 113 2 2 55 98 74 0.535 4.28 8.05 Intr + 158406 158609 204 2 0 69 101 212 0.877 19.07 8.06 Intr + 181200 181307 108 2 0 79 31 89 0.466 1.86 8.07 Intr + 186572 186647 76 1 1 106 111 67 0.769 8.97 8.08 Term + 193102 193259 158 2 2 73 42 94 0.266 0.41 8.09 PlyA + 193287 193292 6 1.05 9.00 Prom + 193341 193380 40 -6.95 9.01 Init + 195978 196244 267 2 0 40 12 328 0.869 17.43 9.02 Term + 196425 196490 66 2 0 111 39 119 0.934 6.26 9.03 PlyA + 196822 196827 6 -3.24 10.06 PlyA - 197195 197190 6 1.05 10.05 Term - 198195 197629 567 0 0 28 36 287 0.876 10.93 10.04 Intr - 206724 206550 175 2 1 27 31 144 0.017 1.62 10.03 Intr - 211384 211236 149 1 2 75 29 106 0.087 1.51 10.02 Intr - 211863 211742 122 1 2 58 103 8 0.051 -1.41 10.01 Init - 217635 217269 367 0 1 68 12 211 0.026 8.73 10.00 Prom - 218114 218075 40 -4.75 11.00 Prom + 220126 220165 40 -4.65 11.01 Init + 223582 223833 252 0 0 53 67 119 0.124 3.69 11.02 Term + 233431 233610 180 0 0 85 55 213 0.993 14.33 11.03 PlyA + 233621 233626 6 1.05 12.04 PlyA - 233974 233969 6 1.05 12.03 Term - 234554 234427 128 0 2 88 45 112 0.694 4.36 12.02 Intr - 242026 241859 168 2 0 15 86 102 0.191 1.70 12.01 Init - 242713 242458 256 1 1 49 40 132 0.181 1.94Click here to view a PDF image of the predicted gene(s)
Click here for a PostScript image of the predicted gene(s)
Predicted peptide sequence(s):
>hg13_dna|GENSCAN_predicted_peptide_1|270_aa
MLQWFLLAVGFLLPPPLPQGENRIQGRLRVGDERPSPTYLDDAAPAGQAARLALPGEGTD
DRAGGRGERNAAGSPGRASPGFVPGHAEAAVSVGNPAAVIARPPVDEATSPDTVFPHSGF
QGALGTEGRARGASVMAVAAMSEIPGEKTYGIHMREKRQINLVTQKVAGQVSGGAPSHPL
LQKTLEGGLGKDKNGDLFPDLPPDLPARVDADQHLQTLTDNMNADWKESNWSSQIISPKA
KALVEPEVFSKNPEESEDEAVTQRFTEKTG
>hg13_dna|GENSCAN_predicted_peptide_2|198_aa
MDVRFTNSHFLKVSGAKAVPESGIHDQVTQVPIVPFTCNSVEDASEVKMRKLQEDMQSAQ
QTTANLCHSTKHAVLCIHGSRLMLFPVPEDYPPPPPSSGCGMTALRVAKLYLGTALSSPV
ILVVSSKPSSTLSYFTVWQPVMMRTGHQLVQGFSRSHHVQVSTAAEMFHMLIHRQQKTVK
GTALNSAGGSNQLVSEFD
>hg13_dna|GENSCAN_predicted_peptide_3|116_aa
MARLSFRSVSVGSLERRLARMFSNIRGFYPLADLYPLNASNTPISHDNKKVSPALPNVPS
GADYPWLRTTGIANAEEAGLSYVIRFTTFLFSTITSDPFVSQQMPVTKKNKNSGGP
>hg13_dna|GENSCAN_predicted_peptide_4|563_aa
MDKIDRLLARLIKKKREKNQIDAIKNDKGDITTDPTEIQTTIREYYKHLYANKLENLEEM
DKFLDTYTLPRPNQEEVESLNRPITGSEIAAIINSLPTKKSPGPDGFTAEFYQRYKEELV
PFLLKLFQSIEKEGILPNSFYEVSIILIPKPGRDTTKKENFRPISLMNIDAKILNKILAN
RIQQHIKKLIHHDQVGFIPGMQGWFNICKSINIIQHINRTKNKNHMIISIDAEKAFDKIQ
QPFMLKTLNKLGIDGTYLKIIRAIYDKPTANIILNGQKLEAFPLKTGTRQACPLSPLLFN
IVLEVLARAIRQEKEIKGIQLGKEEVKLSLFADDMIVYLENPIVSAQNLLKLISNFSKVS
GHKINVQKSQAFLYTNNRQTESQIMSELPFTIASKRIKYLGIQLTRDGKDLLKENYKPLL
NEIKEDTNKWKNIPCSWVGRTNMVKMAILPKVIYRFNAIPIKLPMTFFTELEKTTLKFIW
NQKRARIAKSILSQKNKAGGITLPDFKLYYKATVTKTAWYWYQNRDIDQWNRTEPSEIIP
HIYNHLIFDKPEKNKKWEERFPI
>hg13_dna|GENSCAN_predicted_peptide_5|273_aa
MDKFLDTYTLPRLNQEEVKSLNRPMTGSEIEAIINSILTKKKVQDQKDSRPNSTRAPNLL
KLISNFSKVSGYKINVQKSQAFLYTNNRQTESQIMSELPFTIASKRIKYLGIQLTRDGKD
LLKENYKPLLNEIKEDTNKWKNIPCSWIGRINIVKMAILPKVIYRFNAIPIKLPMTFFTE
LEKTTLKFIWNQKRARIAKRILSQKNKAGGITLPDFKLYYKATVTKTAWYWYQNRHIDQW
NRTEPSEIIPHIYNHLIFDKPDKNKKREKGFPI
>hg13_dna|GENSCAN_predicted_peptide_6|159_aa
MSELPFTIASKRIKYLGIQLTRDMKDLFKENYKPLLKEIKEDTNKWKNIPCSWVGRINIV
KMAISPKVIYRFSAIPFKLPMTFFTELEKTTLKFIWNQKRARIAKSILSQKNKAGGITLP
DFKLYYKATVTLVPKQRYRSMEQNRALRNNAAYLQISDL
>hg13_dna|GENSCAN_predicted_peptide_7|160_aa
MRSPAPANHRNGWNHQQLVNYITQGRMTSKVGQLKRVNVGINKHATSTGTALGKLKHMII
LPESDLINNVLSSKSQALEMQVRGASARPPLLGSEEPLCPASHSVQEGVLSKTGIKALQA
LCFSQPNTEYHQQPHHTRPEPKRATPPQILAEVSLLANGA
>hg13_dna|GENSCAN_predicted_peptide_8|1079_aa
MEDGTKQKRERKKTVSFSSMPTEKKISSASDCINSMVEGSELKKVRSNSRIYHRYFLLDA
DMQSLRWEPSKKDSEKAKIDIKSIKEVRTGKNTDIFRSNGISDQISEDCAFSVIYGENYE
SLDLVANSADVANIWVTGLRYLISYGKHTLDMLESSQDNMRTSWVSQMFSEIDVDNLGHI
TLCNAVQCIRNLNPGLKTSKIELKFKELHKSKDKAGTEVTKEEFIEVFHELCTRPEIYFL
LVQFSSNKEFLDTKDLMMFLEAEQGVAHINEEISLEIIHKYEPSKEGQEKGWLSIDGFTN
YLMSPDCYIFDPEHKKVCQDMKQPLSHYFINSSHNTYLIEDQFRGPSDITGYIRALKMGC
RSVELDVWDGPDNEPVIYTGHTMTSQIVFRSVIDIINKYAFFASEYPLILCLENHCSIKQ
QKVMVQHMKKLLGDKLYTTSPNVEESYLPSPDVLKGKILIKAKKLSSNCSGVEGDVTDED
EGAEMSQRMGKENMEQPNNVPVKRFQLCKELSELVSICKSVQFKEFQVSFQVQKYWEVCS
FNEVLASKYANENPGDFVNYNKRFLARVFPSPMRIDSSNMNPQDFWKCGCQIVAMNFQTP
GLMMDLNIGWFRQNGNCGYVLRPAIMREEVSFFSANTKDSVPGVSPQLLHIKIISGQNFP
KPKGSGAKGDVVDPYVYVEIHGIPADCAEQRTKTVHQNGDAPIFDESFEFQINLPELAMV
RFVVLDDDYIGDEFIGQYTIPFECLQTGYRHVPLQSLTGEVLAHASLFVHVAITNRRGGG
KPHKRGLSVRKGKKSREYASLRTLWIKTVDEVFKNAQPPIRDATDLRENMQYTWESLGFL
NIPYLEHKLMIYQNKKDIGQNQDPSVPEIAHLTLIMIIIGLLLLTVTQSMGLNISSFQNA
VVSFKELCGLSSVANLMQCMLAVSPRFLGPDNTPLVVLNLSEQYPTMELQGIVPEVLKKI
VTTYDMQLKDFSDFVTSLETATTEDAVATSVLSRTGKESSLEMIQSLKALIENADAVYEK
IVHCQKADNMILCLEKPKDSTKNLLELINKCSKAVGYKISMQKSVAFLYANSEQFEKEI
>hg13_dna|GENSCAN_predicted_peptide_9|110_aa
MDFGENQELMDTTPEELTEDNLTEMSASKPVPDNEEYVEEAVPPNKLISDNLEEGFLLFK
TAYDFFYDKDPPDDTGTETKAKSGRSTGTPTLLEDDEDEDLCDDPLPRDK
>hg13_dna|GENSCAN_predicted_peptide_10|459_aa
MVDLDGFMTTVGKVNADGMKITRELELEIEPEEVTKLLQSHHKTVMHEELLLMNEQRKWF
LEMESPPGEDAVNIAVITAKELEYYIHLIDKGMAGFEKTDFQRSSTVGKMLLNSIARYRE
ICVADHVQCRKCYTKRGKHAERGCKITSIRNNSIDTFKMVIFQRREAQQASEMGSCQGSR
KKGWDQHLDLDSQKADLSSTEDTVETRDRKNAKVWLLAYVARALPAGPMESHQADTGENK
QEGRFRTVSRLLLPPLLRDSDKNSHCENNLQKLILRSRALLEDVKAALELVMGKSWNSLE
SSEKDRKMRESLELPRDLLNCCDQNTDSDMDNEVQAQEVSDGNEELIGNWSRGHFCYALA
KSLAALYPCSRDLWNLELESDDLGYLAEEIPKQQSVQDVAWLLLTTYAHMHRKRNDLKLE
FIFNRETEHKSWKNLQPGHVVEKKSPFSEEEFKQAAEIL
>hg13_dna|GENSCAN_predicted_peptide_11|143_aa
MQPTIPSSSRHCLEPQKHLFGSSKNQGQELCLEGSPFRSCTEPFIPLWQLLCTLWGGGQG
GGKAVFLLWGDKLICSSCSMPIKRGQADLLKYAKNETLENLKQIHFAAVSCGLNKPGTEN
ADVQKPRRSLEVIPEKANDETGE
>hg13_dna|GENSCAN_predicted_peptide_12|183_aa
MNIDAKILNKILANRIQQHIKKLIHHDQVGFIPGMQGWFNIRKSINVIQHINRAKDKNHM
IISIDAEKAFDKIQQLFMLKALNKLENKIPRNLTYKGCEGPLQGELQTTAQGNKRGYKQT
EEHSMLMGRKNQYRENGHTAQDSEVHITTDLIQKHQEGSCGYLGSLYSQLLLLKAVETLP
KAK
Explanation
Gn.Ex : gene number, exon number (for reference)
Type : Init = Initial exon (ATG to 5' splice site)
Intr = Internal exon (3' splice site to 5' splice site)
Term = Terminal exon (3' splice site to stop codon)
Sngl = Single-exon gene (ATG to stop)
Prom = Promoter (TATA box / initation site)
PlyA = poly-A signal (consensus: AATAAA)
S : DNA strand (+ = input strand; - = opposite strand)
Begin : beginning of exon or signal (numbered on input strand)
End : end point of exon or signal (numbered on input strand)
Len : length of exon or signal (bp)
Fr : reading frame (a forward strand codon ending at x has frame x mod 3)
Ph : net phase of exon (exon length modulo 3)
I/Ac : initiation signal or 3' splice site score (tenth bit units)
Do/T : 5' splice site or termination signal score (tenth bit units)
CodRg : coding region score (tenth bit units)
P : probability of exon (sum over all parses containing exon)
Tscr : exon score (depends on length, I/Ac, Do/T and CodRg scores)
Comments
The SCORE of a predicted feature (e.g., exon or splice site) is a
log-odds measure of the quality of the feature based on local sequence
properties. For example, a predicted 5' splice site with
score > 100 is strong; 50-100 is moderate; 0-50 is weak; and
below 0 is poor (more than likely not a real donor site).
The PROBABILITY of a predicted exon is the estimated probability under
GENSCAN's model of genomic sequence structure that the exon is correct.
This probability depends in general on global as well as local sequence
properties, e.g., it depends on how well the exon fits with neighboring
exons. It has been shown that predicted exons with higher probabilities
are more likely to be correct than those with lower probabilities.