Introduction ============ .. currentmodule::gesso GESSO is a Python package designed for the analysis of spatial transcriptomics expression data at the gene set/pathway level. Given a user-provided gene set/pathway, GESSO computes a gene set activity score (GAS) for each spatial spot in the dataset. Installation ------------ Install GESSO into a new Python environment. Your environment should have Python 3.10 or later, as well as pip and git installed. .. code-block:: bash git clone https://github.com/YMa-lab/GESSO.git cd gesso pip install . cd .. Quick Start ----------- GESSO processes spatial transcriptomics data along with a user-defined gene set (pathway) and outputs a gene set activity score (GAS) for each spatial spot. Suppose your spatial transcriptomics dataset contains counts for :math:`G` genes across :math:`N` spots. You’ll need to prepare an :math:`N \times G` expression ``pd.DataFrame`` as well as an :math:`N \times 2` locations ``pd.DataFrame``. The indices of the two DataFrames should match. The locations DataFrame should contain two columns named ``x`` and ``y``. .. code-block:: python import pandas as pd from gesso import GESSO # load data expression_df: pd.DataFrame = ... locations_df: pd.DataFrame = ... # initialize a GESSO model model = GESSO( expression_df=expression_df, locations_df=locations_df, k=20, # increase k to increase spatial smoothing effect normalize_counts_method="normalize-log1p" # optional, use for raw data ) # compute gene set activity scores gas_report = model.compute_gas( genesets_dict={ "example_geneset_1": ["gene1", "gene2", "gene3"], "example_geneset_2": ["gene4", "gene5", "gene6"], }, n_jobs=2 # number of parallel jobs ) gas_df = gas_report.gas_df() # returns N by n_genesets df gas_df.to_csv("gas_output.csv") # test whether each spot exhibits significantly elevated gene set activity htest_report = model.htest_elevated_gas( geneset="example_geneset_1", genes_in_geneset=["gene1", "gene2", "gene3"], control_size=200, n_jobs=8 ) htest_df = htest_report.htest_df() # returns N by 4 df w/ columns 'x', 'y', 'p', 'gas' htest_df.to_csv("htest_output.csv")