TRICHECHUS MANATUS LATIROSTRIS

CONCLUSIONS



The main objective of this work is to make the most accurate and precise annotation of the selenoproteins present in the genome of Trichechus manatus latirostris using various bioinformatic resources.


The different families of selenium proteins in eukaryotic have a high degree of homology. Thanks to this feature, we carry out an approach to identify selenoproteins in other organisms. We use a well-described species, considerably closer to our: Loxodonta africana , to identify selenoproteins in our manatee because they have not yet been recorded.


At the end of the process we have been able to predict and characterize 21 selenoproteins, 2 cysteine homologous and 6 elements of the selenoprotein synthesis machinery.


Specifically, the selenoproteins that have been identified are: Trx family, DI family, GPx familly, Sel15, SelH, SelI, SelK, SelM SelN, SelO family, SelP, SelR family, SelS, SelT, SelU family, SelV, SelW family. We could not found any cysteine homologous. As the machinery of the selenoproteins has a great biological importance, as we would expect, is quite preserved. In the genome of Trichechus manatus latirostris we have been capable to distinguish the subsequent proteins: eEFSec, SBP2, SECp43, SPS1, SPS2 and PSTK. Regarding the presence of SECIS elements, in some cases we found homology with cysteines, that maintained SECIS element as all the contigs of GPx, Sel15 and SelT. Conversely, for some selenoproteins we could not predict SECIS elements: SelW, SelP.


Regarding GPx family, we could not predict proteins that were homologous in Trichechus manatus latirostris , only the contigs that might correspond with some proteins. In the great majority of these contigs we find a selenocysteine residue and SECIS element so we can affirm that they are selenoproteins. Further studies ought to clarify if these contigs coincide to real GPx selenoproteins.


In the case of the families of proteins, the fact that phylogenies match with the prediction confirms that the assignation of each contig according to stablished criteria was successfully determined.


Must be remarkable the discovery in SelR family. The several times repeated fragment with the selenocysteine found in the Trichechus manatus latirostris genome, should be deeply studied. If it is not a computational error, could provide more information about selenocysteine and selenoproteins.


Concerning SelV it's reliable to say that in SelenoDB and Ensembl we only found this selenoprotein studied in humans. However we were able to predict an homologous selenoprotein in our genome.


The rest of selenoproteins and proteins without selenocysteine were founded as expected with a correct alignment. The only problems were some contigs of GPx family that contained a selenocysteine that is not aligned with Loxodonta africana genome.


Regarding TR family, we could predict three proteins that were homologous in Trichechus manatus latirostris . We found three different contigs that might correspond with each protein TR1, TR2 and TR3. All the contigs are the same in all TR, so we cannot differ which one correspond to TR1, TR2 o TR3. In all of these contigs we find a selenocysteine residue and SECIS element, so we can affirm that they are all selenoproteins. Further studies ought to clarify if these contigs coincide to real TR selenoproteins.


The same thing happens in the DI protein family. In each one of queries that were blasted into our manatee’s genome, the same contigs appeared. So again, we could not say which one of the contig was DI1, DI2 or DI3. We can conclude that only two of them have selenocysteines, while the other does not. We need further studies ought to clarify if these contigs coincide to real DI selenoproteins.


Regarding msrA, we compared this protein with the msrA of the elephant. The shown results correspond to the alignment with the msrA protein of the elephant, which present the maximum resemblance (90%). However, as we can see in the T-coffee, the alignment between them is not good at all, we only have a small sequence in the middle in which the aminoacids are more conserved. So, we could not predict if there is any selenocysteine or SECI sequence.



Limitations

The procedure used has some limitations. Firstly, the procedure requires genomic information of the species is well annotated and, in some cases, this is not possible. It also needs strong and well-annotated databases that include diverse species. For example, in SelenoDB we could not find the various isoforms for every protein and sometimes the name was badly assigned. In our case the most similar species was Loxodonta africana whose similarity was limited to the same superorder. We think that a greater effort to make a better annotation of selenoproteins better is required.


Secondly, the analysis are only homology-based with other species selenoproteins, so we can't identify new selenoproteins. That is why the selenoproteome described is not completed because some proteins are not predicted properly and we may have selenoproteins that have not been described yet. In addition, due to the restrictions applied in choosing the contigs in order to be significant statistically, you lose those selenoproteins that have diverged over a certain threshold.


Overall, it has to be emphasized the great importance of bioinformatic tools in sequence analysis, as they allow us to analyse a great deal of information through relatively simple methods. This project has provided us a global vision on how informatics is applied to study biological problems and it permitted us to approach to the data collection procedure, programming and interpretation of the extracted information.


Further studies are required to determine if the sequences that we found correspond to protein. Specifically, in case of GPx where almost all the contigs are found in the other proteins members of the family, it should be definitely which combination of contigs result in a protein. More research ought to be made to characterize the selenoproteomes optimally and to unveil the still unknown roles of some selenoproteins.