4. API

4.1. Step1

This module adds the macromolecules and calculates their stoichiometric coefficients.

BOFdat.step1.generate_dna_coefficients(path_to_fasta, path_to_model, DNA_WEIGHT_FRACTION=0.031, conversion_table=None)[source]

Generates a dictionary of metabolite:coefficients for the 4 DNA bases from the organism’s DNA fasta file and the weight percentage of DNA in the cell.

Parameters:
  • path_to_fasta – a path to the DNA fasta file of the organism, format should be compatible with BioPython SeqIO
  • path_to_model – a path to the model, format supported are json and xml
  • DNA_RATIO – the ratio of DNA in the entire cell
Returns:

a dictionary of metabolites and coefficients

BOFdat.step1.generate_rna_coefficients(path_to_genbank, path_to_model, path_to_transcriptomic, RNA_WEIGHT_FRACTION=0.205, rRNA_WEIGHT_FRACTION=0.9, tRNA_WEIGHT_FRACTION=0.05, mRNA_WEIGHT_FRACTION=0.05, identifier='locus_tag', conversion_table=None)[source]

Generates a dictionary of metabolite:coefficients for the 4 RNA bases from the organism’s GenBank annotated file, total RNA weight percentage, transcriptomic. Alternately, ribosomal, transfer and messenger RNA relative abundances can be incorporated otherwise the default 80% rRNA, 10% tRNA and 10% mRNA are used.

Parameters:
  • path_to_genbank – a path to the GenBank annotation file of the organism, format should be compatible with BioPython SeqIO
  • path_to_model – a path to the model, format supported are json and xml
  • path_to_transcriptomic – a two column pandas dataframe (gene_id, abundance)
  • RNA_WEIGHT_FRACTION – the weight fraction of RNA in the entire cell
  • rRNA_WEIGHT_FRACTION – the fraction of rRNA to total
  • tRNA_WEIGHT_FRACTION – the fraction of tRNA to total
  • mRNA_WEIGHT_FRACTION – the fraction of mRNA to total
  • identifier – the type of identifier in the input file, ‘locus_tag’ or ‘geneID’
Returns:

a dictionary of metabolites and coefficients

BOFdat.step1.generate_protein_coefficients(path_to_genbank, path_to_model, path_to_proteomic, PROTEIN_WEIGHT_FRACTION=0.55, conversion_table=None)[source]

Generates a dictionary of metabolite:coefficients for the 20 amino acids contained in proteins from the organism’s GenBank annotated file, total Protein weight percentage and proteomic data.

Parameters:
  • path_to_genbank – a path to the GenBank annotation file of the organism, format should be compatible with BioPython SeqIO
  • path_to_model – a path to the model, format supported are json and xml
  • path_to_proteomic – a two column pandas dataframe (protein_id, abundance)
  • PROTEIN_RATIO – the ratio of DNA in the entire cell
Returns:

a dictionary of metabolites and coefficients

BOFdat.step1.generate_lipid_coefficients(path_to_lipidomic, path_to_conversion_file, path_to_model, LIPID_WEIGHT_FRACTION=0.091, R_WEIGHT=284.486)[source]

Generates a dictionary of metabolite:coefficients for the lipid content of the cell. Lipids vary from a specie to another. The lipidomic data provides the relative abundance of each lipid specie while the to_bigg_dict allows to convert identifiers given in the lipidomic data to BiGG identifiers for which the metabolite weight is known and can be added easily to the biomass.

Parameters:
  • path_to_lipidomic – a dataframe of metabolites identified in the lipidomic experiment
  • path_to_conversion_file – a dictionary converting from the name present in the lipidomic data to BiGG identifiers. This dictionary is generated through manual curation from the modeller.
  • LIPID_RATIO – measured lipid ratio of the cell, otherwise default
  • R_WEIGHT – weight of a carbon chain, otherwise default. If the weight of the lipid is not known it will be inferred based on the number of R chains and this given weight.
Returns:

a dictionary of metabolites and coefficients that can be used to update the biomass objective function.

4.2. Step2

This module finds the co-enzymes and inorganic ions and calculates their stoichiometric coefficients.

BOFdat.step2.find_coenzymes_and_ions(path_to_model)[source]

This function finds both coenzymes and inorganic ions in the model. The coenzymes are found based on the level of connectivity of the metabolites. The inorganic ions are found based on prior knowledge of cell ionic composition.

Parameters:
  • path_to_model – The path to the model, json or sbml formats supported
  • WEIGHT_FRACTION – The expected weight fraction of the soluble pool
Returns:

Dictionary of metabolites and stoichiometric coefficients

4.3. Step3

This module finds the cluster of specie-specific metabolic end goals and calculates their stoichiometric coefficients.

BOFdat.step3.generate_initial_population(population_path, model, base_biomass, exp_essentiality, number_of_populations=3, WEIGHT_FRACTION=0.05, **kwargs)[source]

This function generates a given number of initial populations. An initial population is matrix of metabolites and individuals where each individual is a list of 0 and 1 corresponding to the presence or absence of a given metabolite.

Parameters:population_path – The path to write the population to. BOFdat will add “pop_N.csv” to the given path,

where N is the Nth population generated. :param pop_size: The number of populations to generate, default is 3. :param model: A model object. :param base_biomass: The output of step 1 and 2 of BOFdat in 2 column “.csv” file. :param exp_essentiality: Experimental essentiality as a 2 columns “.csv” file. First column is the gene identifiers, second column is the binary experimental essentiality (“E”:0,”NE”:1). :param WEIGHT_FRACTION: weight fraction of the category represented, between 0 and 1 :param kwargs: {‘metab_index’: the list of metabolites used to generate the population} :return: write population files to the name provided

BOFdat.step3.find_metabolites(model_path, init_pop_path, exp_essentiality_path, base_biomass=True, logbook=True, hall_of_fame=True, history=False, processes=None, **kwargs)[source]

This function is the core of BOFdat Step 3. It runs the genetic algorithm on a given population. The algorithm will optimize the biomass composition so that its metabolite composition provides a gene essentiality prediction by the model that best matches experimental essentiality. An initial population of individuals generated with the “generate_initial_population” function is evolved for a given number of generations by systematically applying the genetic operators (mutation, crossovers and selection). The individuals for which the matthews correlation coefficient (MCC) best matches the predicted essentiality are selected.

The output of this function are the logbook and the hall of fame. The logbook shows the progression of the evolution, showing the maximum, mean and minimum MCC for each generation. The Hall of Fame contains the metabolite composition of the 1000 best individuals generated through an entire evolution.

Parameters:
  • model_path – Path to the model for which the biomass objective function is defined
  • init_pop_path – Path to the initial population on which to run the algorithm. The population should be generated with the initial_population module
  • exp_essentiality_path – Path to the experimental essentiality data. Two columns csv file, for each gene a 0 in the essentiality column indicates a non-essential gene, a 1 an essential one.
  • base_biomass – default=True, if True a list of metabolites and their coefficients will always be added

to the individuals generated throughout the evolution. A pre-determined dictionary of metabolites and coefficients can be added with kwargs. :param logbook: default=True, generates a logbook of fitness data over generations to the path in kwargs :param hall_of_fame: default=True, generates a Hall Of Fame of the best individuals of all time to the path in kwargs :param history: default=False, NOT FUNCTIONNAL AS OF 0.1.7 :param processes: defaul=None, the number of cores to use :param kwargs:

BOFdat.step3.cluster_metabolites(outpath, model_path, CONNECTIVITY_THRESHOLD=15, HOF_PERCENT_THRESHOLD=0.2, eps=6, show_frequency=True, show_matrix=True, **kwargs)[source]

This function uses a clustering algorithm based on metabolite network distance to find the metabolic objectives of the cell from the output of the genetic algorithm.

Parameters:
  • outpath – Path to the outputs (Hall of fame) of the genetic algorithm
  • model_path – Path to the model for which the biomass objective function is defined
  • CONNECTIVITY_THRESHOLD – The threshold above which to remove metabolites for network distance calculation
  • show_frequency – boolean, if True will display the frequency of each metabolite in the hall of fames
  • show_matrix – boolean, if True will display the seaborn cluster map for the reduced distance matrix
Returns: