GESSO (gesso.GESSO)

gesso.GESSO is the main class for the gesso Python package. A GESSO object is initialized with a spatial transcriptomics dataset and computes gene set activity scores (GASs) for user-defined gene sets or pathways.

class gesso.GESSO(Gene sEt activity Score analysis with Spatial lOcation)[source]

is a model for spatially informed gene set expression analysis.

__init__(expression_df: DataFrame, locations_df: DataFrame, genesets_df: DataFrame | None = None, k: int = 6, normalize_counts_method: Literal['normalize', 'normalize-log1p', 'none'] = 'none', verbose: bool = True)[source]

Constructs a GESSO (Gene sEt activity Score analysis with Spatial lOcation) model for spatially informed gene set expression analysis. Given spatial transcriptomics data and a gene set or pathway, GESSO will return a gene set activity score (GAS) for each spatial location (spot).

Parameters:
  • expression_df (pd.DataFrame ~ (n_spots, n_genes)) – A DataFrame containing n_spots rows and n_genes columns. The index will be interpreted as the spot ID. The columns will be interpreted as gene names.

  • locations_df (pd.DataFrame ~ (n_spots, 2)) – A DataFrame containing n_spots rows and 2 columns. The index will be interpreted as the spot ID. The index of locations_df must match the index of the expression_df. The columns must be named ‘x’ and ‘y’. Each row represents the location (xy coordinates) of that spot.

  • genesets_df (pd.DataFrame ~ (n_genes, n_genesets) | None) – Default: None. A DataFrame containing n_genes rows and n_genesets columns. The index will be interpreted as gene names. The columns will be interpreted as geneset names. The values must be binary (0 or 1). Entry (i, j) is 1 if gene i is in geneset j, and 0 otherwise. If None, gene sets can be provided later during GAS computation.

  • k (int) – Default: 6. For k-nearest neighbors construction of the location graph Laplacian.

  • normalize_counts_method (Literal["normalize", "normalize-log1p", "none"]) – Default: “none”. How to normalize the counts for each spot. If “normalize”, first scales the total counts for each spot vector (row) to 1, then multiplies each spot vector (row) by the median of the total counts for all spot vectors. If “normalize-log1p”, follows steps for “normalize” but also includes a log1p transformation.

  • verbose (bool) – Default: True. Per-instance override for emitting log messages from this model. When True (default), messages flow through the standard gesso.* loggers; configure output via gesso.logging. When False, all messages from this model are suppressed regardless of logger configuration.

compute_gas(genesets: list[str] | None = None, genesets_dict: dict[str, list[str]] | None = None, beta: float = 0.33, compute_method: Literal['cpu', 'lowres'] = 'cpu', n_jobs: int = -1, n_partitions: int | None = None, partition_method: Literal['random', 'stratified_kmeans'] = 'stratified_kmeans', partition_seed: int = 42, store_gene_contributions: bool = True, verbose: bool | None = None) GeneSetActivityScoresReport[source]
Parameters:
  • genesets (list[str]) – Default: None. A list of gene set names for which the gene set activity scores (GASs) should be computed. If None (and genesets_dict is None), computes gene set activity scores for all genesets provided in the provided genesets DataFrame.

  • genesets_dict (dict[str, list[str]] | None) – Default: None. A dictionary where the keys are geneset names and the values are lists of genes in the geneset. Overrides the genesets parameter.

  • beta (float) – Default: 0.33. Must be in the interval [0, 1]. Suggested beta < 0.5.

  • compute_method (Literal["cpu-sparse", "cpu", "lowres-sparse", "lowres"]) – The method to use for computation.

  • n_jobs (int) – Default: 1. Number of parallel jobs to run. If -1, uses half of all available CPUs.

  • n_partitions (int | None) – Default: None. Number of low resolution subsets to use for the lowres method. Must be an integer if compute_method is “lowres-sparse” or “lowres”. Ignored if compute_method is “cpu-sparse” or “cpu”. If not specified, uses n_partitions = int(n_spots / 5000). If n_partitions < 2, uses n_partitions = 2.

  • partition_method (Literal["random", "stratified_kmeans"]) – Default: “stratified_kmeans”. Method to use for partitioning the spots into subsets for the low resolution method. Ignored if compute_method is “cpu-sparse” or “cpu”.

  • partition_seed (int) – Default: 42. Random seed for reproducibility.

  • store_gene_contributions (bool) – Default: True. If True, stores gene contribution values. Set to False for memory-intensive tasks that do not require gene contribution values.

  • verbose (bool | None) – Default: None. Per-call override. If None, inherits from the model’s verbose setting. If False, suppresses all messages (including per-geneset worker output) for this call regardless of logger configuration. If True, emits messages subject to the configuration in gesso.logging.

Returns:

A report containing the gene set activity scores DataFrame and gene contribution DataFrames (if store_gene_contributions is True).

Return type:

GeneSetActivityScoresReport

htest_elevated_gas(geneset: str | None = None, genes_in_geneset: list[str] | None = None, beta: float = 0.33, n_permutations: int = 500, seed: int = 42, n_jobs: int = -1) PermutationTestReport[source]

Conducts a permutation test at each spot to systematically identify spots with significantly elevated gene set activity.

The null hypothesis is that the gene set activity score at each spot is not significantly different from the activity score of a randomly sampled set of genes of the same size as the geneset.

Parameters:
  • geneset (str | None) – Default: None. Name of the gene set to test. If None, genes_in_geneset must be provided.

  • genes_in_geneset (list[str] | None) – Default: None. List of genes in the gene set to test. If None, geneset must be provided. Overrides geneset if not None.

  • beta (float) – Default: 0.33. Must be in the interval [0, 1]. Suggested beta < 0.5.

  • n_permutations (int) – Default: 500. Number of random gene sets to sample for the test.

  • seed (int) – Default: 42. Random seed for reproducibility.

  • n_jobs (int) – Default: -1. Number of parallel jobs to run. If -1, uses all available CPUs.

Returns:

A report containing the gene set activity scores and p-values for each spot.

Return type:

PermutationTestReport