geneplot

geneplot.createGFFdb(gff3file)[source]

Creates a sqlite3 database of a GFF3 file with the gffutils Pyton package.

Parameters

gff3file (str) – path to the GFF3 file

class geneplot.genome(gff3file, iprfile=None, vcffiles=None)[source]

Bases: object

Instantiates a genome object with genome-associated as paths pointing to the data source and set as class attributes. Data include the GFF3 file of genome annotation (positional), InterproScan output of protein domains identified on protein-coding genes (keyword), and directory with VCF files of polymorphisms (keyword).

Parameters
  • gff3file (str) – path to the GFF3 file

  • iprfile (str) – path to the InterproScan’s output file

  • vcffiles (str) – path to the VCF files directory

class gene(mRNAid, proteinid=None, description=None)[source]

Bases: object

Instantiates a gene object with the method plot() to represent the intron/exon structure of the gene from a GFF3 file, the protein domain topology from InterproScan’s output, and single nucleotide polymofphisms (SNPs) from VCF files.

Parameters
  • mRNAid (str) – gene identifier (ID) according to the GFF3 file annotations.

  • proteinid (str) – protein identifier (ID) from the InterproScan output

  • description (str) – user-defined description of the gene

getsnppos(sp, vcffiles, onlycoding=True)[source]

Selects SNP data overlapping with genome coordinates of the gene ID (class object) from a VCF file whose sample ID matches the “sp” parameter of the function. SNP annotation by SNPEff is retrieved from the VCF file. If absent, de novo annotation of selected SNPs is performed.

Parameters
  • sp (str) – Species ID to select SNP data from the VCF file

  • vcffiles (str) – path to the VCF files directory

  • onlycoding (boolean) – to plot only SNPs located on coding areas of the gene

plot(domtype='Pfam', sp=None, onlycoding=True)[source]

Plots features of the gene ID (class object) previously generated by the functions of the class, including exon and UTR features (the latter only if present in the GFF3 file), Interpro protein domains and SNPs. SNP data are labelled with the genotype according to the VCF file information, and colored based on SNPEff impact, i.e. LOW: green, MODERATE: amber, MODIFIER: pink, HIGH: red. A PNG image is generated.

Parameters
  • domtype (str) – protein domain type (as specified in the InterproScan output) to be plotted (Pfam by default).

  • sp (str) – Species ID to select SNP data from the VCF file.

  • onlycoding (boolean) – to plot only SNPs located on coding areas of the gene

proteindoms(iprfile, proteinid)[source]

Builds a dictionary from the InterproScan file provided as input of the class “gene”.

Parameters
  • iprfile (str) – path to the InterproScan’s output file

  • proteinid (str) – protein identifier (ID) from the InterproScan output

transcriptpos_to_genomepos()[source]

Calculates genome coordinates for every nucleotide position of the transcript according to the GFF3 and FASTA files provided as input during the instantiation of the gene class.