Spatial trajectory-based analysis

This tutorial demonstrates how to analyze feature changes along the trajectory inferred by ONTraC.

Features may include cell type composition, gene expression, regulon activity, or any other cell-level or niche-level scores.

Load Modules

import numpy as np
import pandas as pd
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['ps.fonttype'] = 42
mpl.rcParams['font.sans-serif'] = 'Arial'
import matplotlib.pyplot as plt
import seaborn as sns
from pprint import pprint
%matplotlib inline
from optparse import Values
from ONTraC.analysis.data import AnaData
from ONTraC.utils import write_version_info

write_version_info()
##################################################################################

         ▄▄█▀▀██   ▀█▄   ▀█▀ █▀▀██▀▀█                   ▄▄█▀▀▀▄█
        ▄█▀    ██   █▀█   █     ██    ▄▄▄ ▄▄   ▄▄▄▄   ▄█▀     ▀
        ██      ██  █ ▀█▄ █     ██     ██▀ ▀▀ ▀▀ ▄██  ██
        ▀█▄     ██  █   ███     ██     ██     ▄█▀ ██  ▀█▄      ▄
         ▀▀█▄▄▄█▀  ▄█▄   ▀█    ▄██▄   ▄██▄    ▀█▄▄▀█▀  ▀▀█▄▄▄▄▀

                        version: 1.2.2

##################################################################################
from ONTraC.analysis.trajectory import (construct_meta_cell_along_trajectory,
                                        cal_features_correlation_along_trajectory,
                                        plot_scatter_feat_along_trajectory,
                                        plot_cell_type_composition_along_trajectory_from_anadata,
                                        plot_cell_type_composition_along_trajectory)
from scipy import stats
from statsmodels.stats.multitest import multipletests
import gseapy as gp

Dataset Explanation

The Stereo-seq mouse embryonic midbrain dataset was originally published by Chen, et al. The raw data could be download from the MOSTA website.

We assume that you have runned ONTraC on Stereo-seq mouse embryonic midbrain dataset according to our example tutorial.

Load Data

Download dataset

import requests

# URL of the file
url = "https://zenodo.org/records/15571644/files/Stereo_seq_data.zip"

# Local file path to save the file
file_path = "./Stereo_seq_data.zip"

try:
    # Send a GET request to the URL
    response = requests.get(url)
    response.raise_for_status()  # Check if the request was successful

    # Write the content to the file
    with open(file_path, "wb") as file:
        file.write(response.content)

    print(f"File downloaded and saved to {file_path}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
File downloaded and saved to ./Stereo_seq_data.zip
import zipfile

# Path to the zip file
zip_file_path = "Stereo_seq_data.zip"

# Directory where files will be extracted
extract_to_path = "./"

try:
    # Open the zip file
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        # Extract all files to the specified directory
        zip_ref.extractall(extract_to_path)

    print(f"Files extracted to '{extract_to_path}'")
except zipfile.BadZipFile:
    print("The file is not a valid zip file.")
Files extracted to './'

ONTraC output

vis_options = Values()
vis_options.NN_dir = './Stereo_seq_data/ONTraC_output/stereo_midbrain_base_NN/'
vis_options.GNN_dir = './Stereo_seq_data/ONTraC_output/stereo_midbrain_base_GNN/'
vis_options.NT_dir = './Stereo_seq_data/ONTraC_output/stereo_midbrain_base_NT/'
vis_options.reverse = True
vis_options.output = None
ana_data = AnaData(vis_options)
ana_data.meta_data_df.head()
Sample Cell_Type x y
Cell_ID
E12_E1S3_100034 E12_E1S3 Fibro 15940.0 18584.0
E12_E1S3_100035 E12_E1S3 Fibro 15942.0 18623.0
E12_E1S3_100191 E12_E1S3 Endo 15965.0 18619.0
E12_E1S3_100256 E12_E1S3 Fibro 15969.0 18717.0
E12_E1S3_100264 E12_E1S3 Fibro 15974.0 18692.0

Differentiation potency

The differentialtion potency was calculated using moscot.

Please refer our processing codes for details.

ot_res1 = pd.read_csv('./Stereo_seq_data/source/moscot/E14_E16_1_cm.csv.gz', index_col=0)
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E14_E1S3_loc.csv.gz',index_col = 0)
ot_res1.index = temp.index
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E16_E1S3_loc.csv.gz',index_col = 0)
ot_res1.columns = temp.index
ot_res1.head()
E16_E1S3_21 E16_E1S3_22 E16_E1S3_23 E16_E1S3_26 E16_E1S3_27 E16_E1S3_28 E16_E1S3_29 E16_E1S3_31 E16_E1S3_32 E16_E1S3_33 ... E16_E1S3_6920 E16_E1S3_6921 E16_E1S3_6923 E16_E1S3_6924 E16_E1S3_6925 E16_E1S3_6926 E16_E1S3_6928 E16_E1S3_6929 E16_E1S3_6930 E16_E1S3_6931
E14_E1S3_170808 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000e+00 0.0 0.000000 0.0 0.0 0.0 0.000000e+00 0.0 0.0 0.000000e+00
E14_E1S3_170916 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 6.345486e-31 0.0 0.000154 0.0 0.0 0.0 0.000000e+00 0.0 0.0 4.531762e-26
E14_E1S3_170934 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000e+00 0.0 0.000000 0.0 0.0 0.0 2.703351e-37 0.0 0.0 0.000000e+00
E14_E1S3_171016 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000e+00 0.0 0.000000 0.0 0.0 0.0 0.000000e+00 0.0 0.0 0.000000e+00
E14_E1S3_171024 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000e+00 0.0 0.000000 0.0 0.0 0.0 0.000000e+00 0.0 0.0 0.000000e+00

5 rows × 6643 columns

ot_res2 = pd.read_csv('./Stereo_seq_data/source/moscot/E14_E16_2_cm.csv.gz', index_col=0)
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E14_E1S3_loc.csv.gz',index_col = 0)
ot_res2.index = temp.index
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E16_E2S6_loc.csv.gz',index_col = 0)
ot_res2.columns = temp.index
ot_res2.head()
E16_E2S6_119 E16_E2S6_147 E16_E2S6_164 E16_E2S6_193 E16_E2S6_199 E16_E2S6_203 E16_E2S6_217 E16_E2S6_222 E16_E2S6_230 E16_E2S6_234 ... E16_E2S6_36290 E16_E2S6_36348 E16_E2S6_36407 E16_E2S6_36523 E16_E2S6_36607 E16_E2S6_36632 E16_E2S6_36755 E16_E2S6_36782 E16_E2S6_36838 E16_E2S6_36891
E14_E1S3_170808 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.000000 0.000000 0.000000e+00 0.0 0.0 0.000000e+00 0.0 0.0 0.0
E14_E1S3_170916 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.000000 0.000000 0.000000e+00 0.0 0.0 0.000000e+00 0.0 0.0 0.0
E14_E1S3_170934 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.000112 0.000002 9.689674e-31 0.0 0.0 8.916933e-08 0.0 0.0 0.0
E14_E1S3_171016 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.000000 0.000000 0.000000e+00 0.0 0.0 1.239693e-06 0.0 0.0 0.0
E14_E1S3_171024 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.000000 0.000000 0.000000e+00 0.0 0.0 0.000000e+00 0.0 0.0 0.0

5 rows × 5506 columns

ot_res3 = pd.read_csv('./Stereo_seq_data/source/moscot/E14_E16_3_cm.csv.gz', index_col=0)
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E14_E1S3_loc.csv.gz',index_col = 0)
ot_res3.index = temp.index
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E16_E2S7_loc.csv.gz',index_col = 0)
ot_res3.columns = temp.index
ot_res3.head()
E16_E2S7_291152 E16_E2S7_291165 E16_E2S7_291300 E16_E2S7_291398 E16_E2S7_291435 E16_E2S7_291487 E16_E2S7_291503 E16_E2S7_291550 E16_E2S7_291585 E16_E2S7_291591 ... E16_E2S7_326320 E16_E2S7_326323 E16_E2S7_326324 E16_E2S7_326325 E16_E2S7_326329 E16_E2S7_326357 E16_E2S7_326359 E16_E2S7_326384 E16_E2S7_326391 E16_E2S7_326412
E14_E1S3_170808 0.0 1.598103e-18 0.0 0.0 7.107961e-09 0.0 1.541208e-25 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
E14_E1S3_170916 0.0 0.000000e+00 0.0 0.0 0.000000e+00 0.0 0.000000e+00 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
E14_E1S3_170934 0.0 0.000000e+00 0.0 0.0 0.000000e+00 0.0 0.000000e+00 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
E14_E1S3_171016 0.0 0.000000e+00 0.0 0.0 0.000000e+00 0.0 0.000000e+00 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
E14_E1S3_171024 0.0 0.000000e+00 0.0 0.0 0.000000e+00 0.0 0.000000e+00 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 7261 columns

Gene expression

E14_RGC_scaled_exp = pd.read_csv('./Stereo_seq_data/source/stereo_seq_E14_RGC_scaled_exp.csv.gz', index_col=0)
E14_RGC_scaled_exp.head()
0610005C13Rik 0610006L08Rik 0610009B22Rik 0610009O20Rik 0610010F05Rik 0610010K14Rik 0610012G03Rik 0610025J13Rik 0610030E20Rik 0610031O16Rik ... Zw10 Zwilch Zwint Zxdb Zxdc Zyg11a Zyg11b Zyx Zzef1 Zzz3
Cell_ID
E14_E1S3_171289 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 3.277203 -0.321011 -0.025192 -0.088172 -0.008486 ... -0.091104 -0.068489 -0.389927 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 -0.199425
E14_E1S3_171863 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 -0.307818 -0.321011 -0.025192 -0.088172 -0.008486 ... -0.091104 -0.068489 -0.389927 -0.073764 9.345431 -0.043855 -0.223836 -0.21233 -0.148131 -0.199425
E14_E1S3_171967 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 -0.307818 -0.321011 -0.025192 -0.088172 -0.008486 ... -0.091104 -0.068489 -0.389927 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 -0.199425
E14_E1S3_171983 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 2.074143 -0.321011 -0.025192 -0.088172 -0.008486 ... -0.091104 -0.068489 -0.389927 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 -0.199425
E14_E1S3_172013 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 -0.307818 -0.321011 -0.025192 -0.088172 -0.008486 ... -0.091104 -0.068489 -0.389927 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 -0.199425

5 rows × 23944 columns

Regulon activities

Regulon activities were calculated using pySCENIC.

Please refer our processing codes for details.

regulon_aucell_df = pd.read_csv('./Stereo_seq_data/source/stereo_seq.auc.csv.gz', index_col=0)
regulon_aucell_df.head()
Ahr Alx1 Alx3 Alx4 Ar Arid3a Arntl2 Arx Atf1 Atf3 ... Zfp821 Zfp874b Zfp941 Zfp944 Zfp979 Zic1 Zkscan14 Zkscan16 Zscan20 Zxdc
Cell
E12_E1S3_100034 0.0 0.014262 0.009032 0.038966 0.0 0.001760 0.0 0.000000 0.0 0.134952 ... 0.0 0.000000 0.0 0.0 0.0 0.051056 0.045712 0.017288 0.0 0.0
E12_E1S3_100035 0.0 0.017741 0.017087 0.000000 0.0 0.003681 0.0 0.000000 0.0 0.167847 ... 0.0 0.003940 0.0 0.0 0.0 0.064695 0.062440 0.026075 0.0 0.0
E12_E1S3_100191 0.0 0.009400 0.009933 0.026062 0.0 0.001426 0.0 0.000000 0.0 0.129891 ... 0.0 0.000000 0.0 0.0 0.0 0.088422 0.042828 0.015757 0.0 0.0
E12_E1S3_100256 0.0 0.017928 0.017399 0.000000 0.0 0.003750 0.0 0.000000 0.0 0.168128 ... 0.0 0.004096 0.0 0.0 0.0 0.048256 0.062728 0.091552 0.0 0.0
E12_E1S3_100264 0.0 0.019671 0.019565 0.042947 0.0 0.004231 0.0 0.080958 0.0 0.176000 ... 0.0 0.005547 0.0 0.0 0.0 0.049611 0.066910 0.028448 0.0 0.0

5 rows × 304 columns

Msigdb gene sets

Please refer release note for details.

import requests

# URL of the file
url = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/7.5.1/c5.go.bp.v7.5.1.symbols.gmt"

# Local file path to save the file
file_path = "./c5.go.bp.v7.5.1.symbols.gmt"

try:
    # Send a GET request to the URL
    response = requests.get(url)
    response.raise_for_status()  # Check if the request was successful

    # Write the content to the file
    with open(file_path, "wb") as file:
        file.write(response.content)

    print(f"File downloaded and saved to {file_path}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
File downloaded and saved to ./c5.go.bp.v7.5.1.symbols.gmt

Dataset overview

from ONTraC.analysis.cell_type import plot_spatial_cell_type_distribution_dataset_from_anadata

fig, axes = plot_spatial_cell_type_distribution_dataset_from_anadata(ana_data = ana_data,
                                                                     hue_order = ['RGC', 'GlioB', 'NeuB', 'GluNeuB', 'GluNeu', 'GABA', 'Ery', 'Endo', 'Fibro', 'Basal'])

for ax in axes:
    ax.legend(title='Cell Type', loc='upper left', bbox_to_anchor=(1,1), markerscale=2.5)
    # ax.set_aspect('equal', 'box')  # uncomment this line if you want set the x and y axis with same scaling
    # ax.set_xticks([])  # uncomment this line if you don't want to show x coordinates
    # ax.set_yticks([])  # uncomment this line if you don't want to show y coordinates
    # ax.invert_xaxis()  # uncomment this line if you want to invert x axis
    # ax.invert_yaxis()  # uncomment this line if you want to invert y axis
    
    pass
../_images/d266ba8de27a3879971c1ba5e00387f21b71ba849033e96d5601e28fba4b9010.png

Spatial distribution of NT scores

We visualize the spatial distribution of NT scores for each cell to illustrate the spatial trajectory.

Spatial distribution of NT scores for all cells

from ONTraC.analysis.spatial import plot_cell_NT_score_dataset_from_anadata

fig, axes = plot_cell_NT_score_dataset_from_anadata(ana_data)
../_images/cd7ec9ac6e045c186cdffd66258fd990e01c4b064fa007fa7b1190939ab18982.png

Spatial distribution of NT scores for E14.5 RGCs only

# Selecting E14.5 RGCs from the dataset
plot_meta_data = ana_data.meta_data_df[(ana_data.meta_data_df['Sample'] == 'E14_E1S3') & (ana_data.meta_data_df['Cell_Type'] == 'RGC')]
plot_meta_data.head()
Sample Cell_Type x y
Cell_ID
E14_E1S3_171289 E14_E1S3 RGC 19941.0 9116.0
E14_E1S3_171863 E14_E1S3 RGC 20024.0 9512.0
E14_E1S3_171967 E14_E1S3 RGC 20040.0 9397.0
E14_E1S3_171983 E14_E1S3 RGC 20040.0 9219.0
E14_E1S3_172013 E14_E1S3 RGC 20036.0 9352.0
plot_NT_score = ana_data.NT_score.loc[plot_meta_data.index]
plot_NT_score.head()
x y Niche_NTScore Cell_NTScore
Cell_ID
E14_E1S3_171289 19941.0 9116.0 0.862209 0.894205
E14_E1S3_171863 20024.0 9512.0 0.983229 0.979535
E14_E1S3_171967 20040.0 9397.0 0.994614 0.980553
E14_E1S3_171983 20040.0 9219.0 0.988135 0.958657
E14_E1S3_172013 20036.0 9352.0 0.994703 0.976718
# visualization

from ONTraC.analysis.spatial import plot_cell_NT_score_dataset

fig, ax = plot_cell_NT_score_dataset(meta_data_df=plot_meta_data,
                                     NT_score=plot_NT_score,
                                     reverse=ana_data.options.reverse
                                   )
../_images/50eb11f0d2dc9705d0ee4f919b01f1942d12af9ab324c68ea4501c235c78122a.png

Cell type composition along spatial trajcetory

Cell type composition along the spatial trajectory reflects changes in the microenvironment.

Cell type composition change for all cells

fig, ax = plot_cell_type_composition_along_trajectory_from_anadata(
    ana_data=ana_data,  # AnaData object
    cell_types=None,  # Column name(s) in AnaData.meta_data_df that contains the cell type information.
                      # Default is None, which means all cell types in AnaData.cell_type_codes will be used.
    agg_cell_num=100,  # Number of cells to aggregate in each bin along the trajectory. Default is 10. 1 means no aggregation.
    figsize=(8,4),  # Figure size. Default is (6, 2).
    palette=None,  # Color palette for cell types. If None, use default color palette. Keys are cell types and values are colors.
    output_file_path=None  # Path to save the figure. If None, the default path
                           # {ana_data.options.output}/lineplot_raw_cell_type_composition_along_trajectory.pdf is used. 
                           # If ana_data.options.output is also None, the figure will not be saved and the function 
                           # will return the figure and axes objects instead.
)
../_images/7758c8dc2d0e8910525172916e7ca78dff4a5a4b52813ef6fb8e0d9abad5f78a.png

We observe that the dominant cell type shifts along the spatial trajectory, from RGCs (undifferentiated cells) to NeuB, GluNeuB, and ultimately to fully differentiated cells such as GluNeu and GABA neurons.

Cell type composition change around RGCs only

# create data_df
data_df = ana_data.meta_data_df.copy()
data_df = data_df.join(1 - ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])
data_df = data_df.join(ana_data.cell_type_composition)
# filtering with cell type
data_df = data_df[data_df['Cell_Type'] == 'RGC']
cell_types = ana_data.cell_type_codes['Cell_Type'].values.tolist()
fig ,ax = plot_cell_type_composition_along_trajectory(
    data_df=data_df,
    trajectory='Cell_NTScore',
    cell_types=cell_types, # type: ignore
    agg_cell_num=100,
    figsize=(7,3),
    palette=None,
    output_file_path=None,
)
../_images/4eb91a114da54ad48041131cb2fe65c0806d76da299f303fec0fc6ddcf3d9595.png

In E14.5, RGCs are primarily located within RGC-dominant microenvironments. Along the spatial trajectory, the proportion of RGCs gradually decreases, while NeuB and GluNeuB cells increase sequentially. We will explore the features associated with this dynamic in the following sections.

Differentiation potency along trajectory

We can investigate differentiation potency by using Moscot to predict the potential offspring of RGCs in the next stage (E16.5). The Moscot output have been loaded in previous section.

Selecting RGCs from Moscot results and aggregate replicates information

target_cells = ana_data.meta_data_df[(ana_data.meta_data_df['Cell_Type'] == 'RGC') & (ana_data.meta_data_df['Sample'] == 'E14_E1S3')]
def ot_res_process(ot_res):
    ot_res = ot_res.loc[ot_res.index.isin(ana_data.meta_data_df.index),
                       ot_res.columns.isin(ana_data.meta_data_df.index)]
    top_5_indices = ot_res.apply(lambda row: row.nlargest(5).index, axis=1)
    top_5_cell_types = top_5_indices.apply(lambda x: ana_data.meta_data_df.loc[x, 'Cell_Type'])
    summary_df = top_5_cell_types.apply(lambda x: x.value_counts(), axis=1).fillna(0).astype(int)
    summary_df = summary_df.loc[target_cells.index]
    summary_df = summary_df.div(summary_df.sum(axis=1).values, axis=0)
    summary_df = summary_df.join(1-ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])
    
    return summary_df
    
summary_df_1 = ot_res_process(ot_res1)
summary_df_1.head()
Basal Endo Ery Fibro GABA GlioB GluNeu GluNeuB NeuB RGC Cell_NTScore
Cell_ID
E14_E1S3_171289 0.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 0.0 0.4 0.105795
E14_E1S3_171863 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.0 0.2 0.020465
E14_E1S3_171967 0.0 0.0 0.0 0.0 0.0 0.4 0.0 0.6 0.0 0.0 0.019447
E14_E1S3_171983 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.8 0.0 0.0 0.041343
E14_E1S3_172013 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.4 0.0 0.4 0.023282
E14_RGC_metacell_diff_p_1_df = construct_meta_cell_along_trajectory(
    meta_data_df = summary_df_1,
    trajectory = 'Cell_NTScore',
    n_cells = 10)
E14_RGC_metacell_diff_p_1_df.head()
Basal Endo Ery Fibro GABA GlioB GluNeu GluNeuB NeuB RGC Cell_NTScore
Cell_ID
E14_E1S3_173789 0.0 0.0 0.00 0.00 0.02 0.16 0.00 0.14 0.30 0.38 0.000230
E14_E1S3_173259 0.0 0.0 0.00 0.02 0.00 0.16 0.02 0.22 0.20 0.38 0.000240
E14_E1S3_174106 0.0 0.0 0.02 0.00 0.00 0.06 0.02 0.24 0.16 0.50 0.000250
E14_E1S3_173417 0.0 0.0 0.00 0.02 0.08 0.14 0.02 0.30 0.04 0.40 0.000266
E14_E1S3_173284 0.0 0.0 0.06 0.00 0.08 0.10 0.02 0.24 0.16 0.34 0.000285
summary_df_2 = ot_res_process(ot_res2)
E14_RGC_metacell_diff_p_2_df = construct_meta_cell_along_trajectory(
    meta_data_df = summary_df_2,
    trajectory = 'Cell_NTScore',
    n_cells = 10)
E14_RGC_metacell_diff_p_2_df.head()
Basal Endo Ery Fibro GABA GlioB GluNeu GluNeuB NeuB RGC Cell_NTScore
Cell_ID
E14_E1S3_173789 0.0 0.0 0.02 0.06 0.12 0.12 0.02 0.06 0.10 0.50 0.000230
E14_E1S3_173259 0.0 0.0 0.00 0.00 0.08 0.14 0.00 0.04 0.14 0.60 0.000240
E14_E1S3_174106 0.0 0.0 0.02 0.04 0.04 0.00 0.02 0.08 0.12 0.68 0.000250
E14_E1S3_173417 0.0 0.0 0.02 0.00 0.04 0.14 0.00 0.08 0.06 0.66 0.000266
E14_E1S3_173284 0.0 0.0 0.00 0.02 0.10 0.12 0.02 0.08 0.08 0.58 0.000285
summary_df_3 = ot_res_process(ot_res3)
E14_RGC_metacell_diff_p_3_df = construct_meta_cell_along_trajectory(
    meta_data_df = summary_df_3,
    trajectory = 'Cell_NTScore',
    n_cells = 10)
E14_RGC_metacell_diff_p_3_df.head()
Basal Endo Ery Fibro GABA GlioB GluNeu GluNeuB NeuB RGC Cell_NTScore
Cell_ID
E14_E1S3_173789 0.00 0.0 0.00 0.12 0.02 0.02 0.04 0.10 0.04 0.66 0.000230
E14_E1S3_173259 0.02 0.0 0.02 0.04 0.02 0.02 0.00 0.06 0.02 0.80 0.000240
E14_E1S3_174106 0.00 0.0 0.12 0.06 0.08 0.02 0.02 0.08 0.00 0.62 0.000250
E14_E1S3_173417 0.02 0.0 0.04 0.06 0.04 0.02 0.04 0.14 0.00 0.64 0.000266
E14_E1S3_173284 0.00 0.0 0.02 0.06 0.12 0.04 0.08 0.20 0.00 0.48 0.000285
E14_RGC_metacell_diff_p_1_melted_df = E14_RGC_metacell_diff_p_1_df.melt(
    id_vars='Cell_NTScore',
    value_vars=E14_RGC_metacell_diff_p_1_df.columns.tolist()[:-1],
    var_name='Cell type',
    value_name='Proportion')
E14_RGC_metacell_diff_p_1_melted_df['replicate'] = ['rep1'] * E14_RGC_metacell_diff_p_1_melted_df.shape[0]
E14_RGC_metacell_diff_p_1_melted_df.head()
Cell_NTScore Cell type Proportion replicate
0 0.000230 Basal 0.0 rep1
1 0.000240 Basal 0.0 rep1
2 0.000250 Basal 0.0 rep1
3 0.000266 Basal 0.0 rep1
4 0.000285 Basal 0.0 rep1
E14_RGC_metacell_diff_p_2_melted_df = E14_RGC_metacell_diff_p_2_df.melt(
    id_vars='Cell_NTScore',
    value_vars=E14_RGC_metacell_diff_p_2_df.columns.tolist()[:-1],
    var_name='Cell type',
    value_name='Proportion')
E14_RGC_metacell_diff_p_2_melted_df['replicate'] = ['rep2'] * E14_RGC_metacell_diff_p_2_melted_df.shape[0]
E14_RGC_metacell_diff_p_2_melted_df.head()
Cell_NTScore Cell type Proportion replicate
0 0.000230 Basal 0.0 rep2
1 0.000240 Basal 0.0 rep2
2 0.000250 Basal 0.0 rep2
3 0.000266 Basal 0.0 rep2
4 0.000285 Basal 0.0 rep2
E14_RGC_metacell_diff_p_3_melted_df = E14_RGC_metacell_diff_p_3_df.melt(
    id_vars='Cell_NTScore',
    value_vars=E14_RGC_metacell_diff_p_3_df.columns.tolist()[:-1],
    var_name='Cell type',
    value_name='Proportion')
E14_RGC_metacell_diff_p_3_melted_df['replicate'] = ['rep3'] * E14_RGC_metacell_diff_p_3_melted_df.shape[0]
E14_RGC_metacell_diff_p_3_melted_df.head()
Cell_NTScore Cell type Proportion replicate
0 0.000230 Basal 0.00 rep3
1 0.000240 Basal 0.02 rep3
2 0.000250 Basal 0.00 rep3
3 0.000266 Basal 0.02 rep3
4 0.000285 Basal 0.00 rep3
E14_RGC_metacell_diff_p_melted = pd.concat([E14_RGC_metacell_diff_p_1_melted_df,
                                            E14_RGC_metacell_diff_p_2_melted_df,
                                            E14_RGC_metacell_diff_p_3_melted_df])
E14_RGC_metacell_diff_p_melted.head()
Cell_NTScore Cell type Proportion replicate
0 0.000230 Basal 0.0 rep1
1 0.000240 Basal 0.0 rep1
2 0.000250 Basal 0.0 rep1
3 0.000266 Basal 0.0 rep1
4 0.000285 Basal 0.0 rep1
sns.lmplot(data = E14_RGC_metacell_diff_p_melted[E14_RGC_metacell_diff_p_melted['Cell type'].isin(['RGC', 'NeuB'])],
           x = 'Cell_NTScore',
           y = 'Proportion',
           hue = 'Cell type',
           scatter_kws={'edgecolor': None},
           ci=95,
          )
<seaborn.axisgrid.FacetGrid at 0x151b22d76110>
../_images/99bf0669d162d1a93ba438dd0a89bbaa77143aec58edb8549746fa07747a207e.png

Along the spatial trajectory, the probability of RGCs maintaining their identity decreases, while the probability of their differentiation into NeuB increases.

Gene expression changes along spatial trajectory

Next, we explore gene expression dynamics along the spatial trajectory.

E14_RGC_gene_exp_df = E14_RGC_scaled_exp.join(1 - ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])
# The meta-cell could reduce the noise here
E14_RGC_gene_exp_metacell_data_df = construct_meta_cell_along_trajectory(
    meta_data_df = E14_RGC_gene_exp_df,
    trajectory = 'Cell_NTScore',
    n_cells = 10
)
E14_RGC_gene_exp_metacell_data_df.head()
0610005C13Rik 0610006L08Rik 0610009B22Rik 0610009O20Rik 0610010F05Rik 0610010K14Rik 0610012G03Rik 0610025J13Rik 0610030E20Rik 0610031O16Rik ... Zwilch Zwint Zxdb Zxdc Zyg11a Zyg11b Zyx Zzef1 Zzz3 Cell_NTScore
Cell_ID
E14_E1S3_173789 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 -0.098032 0.516922 -0.025192 -0.088172 -0.008486 ... 0.938360 0.371673 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 0.353277 0.000230
E14_E1S3_173259 -0.036511 -0.008535 -0.198124 -0.106792 0.117162 0.537851 -0.049493 -0.025192 -0.088172 -0.008486 ... -0.068489 0.011324 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 0.946734 -0.199425 0.000240
E14_E1S3_174106 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 -0.093338 0.008216 -0.025192 -0.088172 -0.008486 ... -0.068489 -0.117173 0.933613 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 0.234650 0.000250
E14_E1S3_173417 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 -0.064664 -0.321011 -0.025192 -0.088172 -0.008486 ... -0.068489 -0.081511 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 -0.199425 0.000266
E14_E1S3_173284 -0.036511 -0.008535 -0.198124 -0.106792 -0.239179 0.321239 -0.321011 -0.025192 -0.088172 -0.008486 ... -0.068489 -0.389927 -0.073764 -0.117644 -0.043855 -0.223836 -0.21233 -0.148131 -0.199425 0.000285

5 rows × 23945 columns

gene_correlated_df = cal_features_correlation_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    rho_threshold=0.4,
    p_val_threshold=0.01
)
print(gene_correlated_df.shape)
/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])
(109, 2)

We identified 109 genes whose expression levels are significantly correlated with cell-level NT scores. Here, we highlight six representative genes that are strongly associated with neuronal differentiation and maturation.

gene_correlated_df.head()
PCC P_Value
Feature
Cdkn1c 0.629617 9.666948e-07
Gm3764 0.623302 1.333598e-06
Lbh 0.618297 1.712368e-06
Bbs9 0.550067 3.502920e-05
Phf21b 0.542046 4.788367e-05

Different Filtering Parameters

You can also select top N genes by following command:

cal_features_correlation_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    top_n=5,
    rho_threshold=0.4,
    p_val_threshold=0.01
)
/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])
PCC P_Value
Feature
Cdkn1c 0.629617 9.666948e-07
Gm3764 0.623302 1.333598e-06
Lbh 0.618297 1.712368e-06
Bbs9 0.550067 3.502920e-05
Phf21b 0.542046 4.788367e-05
Mcm3 -0.453214 9.492690e-04
Eif2s2 -0.456194 8.696506e-04
Tuba1b -0.470727 5.608145e-04
Ccnd2 -0.479827 4.218648e-04
Ppia -0.484622 3.619282e-04

Heatmap Showing Trajectory-Associated Genes

filtered_metacell_gene_exp_df = E14_RGC_gene_exp_metacell_data_df.loc[:,gene_correlated_df.index]
filtered_metacell_gene_exp_df.head()
Feature Cdkn1c Gm3764 Lbh Bbs9 Phf21b Helt Nkain3 Pgpep1 Rhbdl3 Akr1b8 ... Cd24a Rps8 Rpl22l1 Rpl19 Phgdh Mcm3 Eif2s2 Tuba1b Ccnd2 Ppia
Cell_ID
E14_E1S3_173789 -0.126359 -0.152949 0.148899 -0.111262 -0.150273 -0.108984 -0.237243 -0.167096 -0.122761 -0.052675 ... -0.418015 0.575271 0.220949 0.309830 0.164429 -0.159499 0.234015 0.449034 0.433422 0.543230
E14_E1S3_173259 -0.570163 -0.152949 -0.192196 -0.111262 -0.150273 -0.108984 -0.237243 -0.167096 -0.122761 -0.052675 ... -0.251175 0.916426 0.492244 0.625080 1.361812 0.350744 0.431511 0.532895 0.642079 0.112190
E14_E1S3_174106 -0.570163 -0.152949 -0.192196 -0.111262 -0.150273 -0.108984 -0.237243 -0.167096 -0.122761 -0.052675 ... -0.207797 0.531247 -0.033359 0.499405 1.262983 1.341346 0.214287 0.608114 -0.371306 0.839393
E14_E1S3_173417 -0.570163 -0.152949 -0.192196 -0.111262 -0.150273 -0.108984 -0.237243 -0.167096 -0.122761 -0.052675 ... -0.593854 0.697930 0.274315 -0.226063 -0.240031 1.718599 -0.099978 0.528772 0.449835 0.103245
E14_E1S3_173284 -0.570163 -0.152949 -0.192196 -0.111262 -0.150273 -0.108984 -0.237243 -0.167096 -0.122761 -0.052675 ... -0.371482 0.731672 0.595815 0.640199 0.716297 1.006487 0.340025 0.822160 0.210475 0.089651

5 rows × 109 columns

fig, ax = plt.subplots(figsize=(5, 20))
sns.heatmap(filtered_metacell_gene_exp_df.iloc[::-1,].T,
            cmap='bwr',
            vmin=-2,
            vmax=2,
            cbar_kws={'label': 'Normalized Gene Expression'},
            ax=ax)
ax.set_xticks([])
ax.set_xlabel('0 -------------- NT Scores --------------> 1')
ax.set_ylabel('Genes')
Text(33.222222222222214, 0.5, 'Genes')
../_images/8a14bd6b6a54d0db538d82a02756956d8770dbc039813cf6b6ad5f5c26fadec9.png

Diagnosis of Selected Genes

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Ppia',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
)
../_images/804a72e4f35e0e06d94561cc53392ce404c7ceb4b564a7cb1138f4565f5a98f7.png
fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Cdkn1c',
    fit_reg = True,
    annotate_pos = 'upper left',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    ci=95,  # Size of the confidence interval for the regression estimate
)
../_images/b1f8347ec6b013a5d09f708ff990de1d9fd761ecf95ff5f8d716c9b973a66fed.png
fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Ccnd2',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    line_kws = {'color': 'green', 'lw': 4},  # line parameters
    ci=70,  # Size of the confidence interval for the regression estimate
)
../_images/631d7c7c078fb2498834b160fa54706230837a961eeb7693714ff3b4ffd71aeb.png
fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Phf21b',
    fit_reg = True,
    annotate_pos = 'upper left',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    scatter_kws = {'color': 'orange', 'edgecolor': 'gray', 's': 50},  # scatter parameters
)
../_images/eac53cd413aa8f024034ed446769da3ca1d7d17e3364874e2651d0b9f82cf0a2.png
fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Efna5',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    scatter_kws = {'color': 'purple', 'edgecolor': 'gray', 's': 50},  # scatter parameters
    line_kws = {'color': 'blue', 'lw': 4, 'ls': '--'},  # line parameters
    ci = 95,  # Size of the confidence interval for the regression estimate
    marker = '*',  # Marker to use for the scatterplot glyphs.
)
../_images/3f2a45dc72eaf4ad95bb310461ddf2ab1a38ff4f1e540446d1c75cbf2d59579c.png
fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Dcc',
    fit_reg = True,
    annotate_pos = 'upper left',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    scatter_kws = {'color': 'green', 's': 50},  # scatter parameters
    line_kws = {'color': 'C0', 'lw': 2, 'ls': '-.'},  # line parameters
    ci = 95,  # Size of the confidence interval for the regression estimate.
    marker = 'x',  # Marker to use for the scatterplot glyphs.
)
../_images/82914105f755e22455da9d57062f26cbb51f23c3a98b9f1ea9fed7c6ae78383c.png

Gene Set Enrichment Analysis (GSEA)

# keep all genes while calculate correlation
all_gene_correlation_df = cal_features_correlation_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
)
print(all_gene_correlation_df.shape)
/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])
(13636, 2)
all_gene_correlation_df.head()
PCC P_Value
Feature
Cdkn1c 0.629617 9.666948e-07
Gm3764 0.623302 1.333598e-06
Lbh 0.618297 1.712368e-06
Bbs9 0.550067 3.502920e-05
Phf21b 0.542046 4.788367e-05
all_gene_correlation_df.tail()
PCC P_Value
Feature
Mcm3 -0.453214 0.000949
Eif2s2 -0.456194 0.000870
Tuba1b -0.470727 0.000561
Ccnd2 -0.479827 0.000422
Ppia -0.484622 0.000362
reject, qvals, _, _ = multipletests(all_gene_correlation_df['P_Value'].values, alpha=0.05, method="fdr_bh")
all_gene_correlation_df.loc[:,'FDR Q-Value'] = qvals
all_gene_correlation_df.head()
PCC P_Value FDR Q-Value
Feature
Cdkn1c 0.629617 9.666948e-07 0.007783
Gm3764 0.623302 1.333598e-06 0.007783
Lbh 0.618297 1.712368e-06 0.007783
Bbs9 0.550067 3.502920e-05 0.119415
Phf21b 0.542046 4.788367e-05 0.130588
all_gene_correlation_df.tail()
PCC P_Value FDR Q-Value
Feature
Mcm3 -0.453214 0.000949 0.311213
Eif2s2 -0.456194 0.000870 0.311213
Tuba1b -0.470727 0.000561 0.275297
Ccnd2 -0.479827 0.000422 0.225432
Ppia -0.484622 0.000362 0.224330
# corrected, only one columns should list here, otherwise, there are multiple ranks as inputs
corr_rnk = all_gene_correlation_df.iloc[:,:1]
corr_rnk.head()
PCC
Feature
Cdkn1c 0.629617
Gm3764 0.623302
Lbh 0.618297
Bbs9 0.550067
Phf21b 0.542046
corr_rnk.tail()
PCC
Feature
Mcm3 -0.453214
Eif2s2 -0.456194
Tuba1b -0.470727
Ccnd2 -0.479827
Ppia -0.484622
pre_res = gp.prerank(rnk=corr_rnk, # or rnk = rnk,
                     gene_sets='./c5.go.bp.v7.5.1.symbols.gmt',
                     threads=4,
                     min_size=5,
                     max_size=1000,
                     permutation_num=1000, # reduce number to speed up testing
                     outdir=None, # don't write to disk
                     seed=6,
                     verbose=True, # see what's going on behind the scenes
                    )
2025-08-24 12:13:10,204 [WARNING] Duplicated values found in preranked stats: 16.66% of genes
The order of those genes will be arbitrary, which may produce unexpected results.
2025-08-24 12:13:10,204 [INFO] Parsing data files for GSEA.............................
2025-08-24 12:13:10,381 [INFO] 1188 gene_sets have been filtered out when max_size=1000 and min_size=5
2025-08-24 12:13:10,382 [INFO] 6470 gene_sets used for further statistical testing.....
2025-08-24 12:13:10,382 [INFO] Start to run GSEA...Might take a while..................
2025-08-24 12:13:10,383 [INFO] Genes are converted to uppercase.
2025-08-24 12:14:15,539 [INFO] Congratulations. GSEApy runs successfully................
res = pre_res.res2d
filter_res = res[[True if x.startswith('GOBP') else False for x in res['Term']]]
filter_res = filter_res[filter_res['NOM p-val'] < 0.05]
filter_res.head()
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
0 prerank GOBP_CYTOPLASMIC_TRANSLATION -0.449215 -2.385238 0.0 0.004127 0.004 58/135 20.85% Eif2s2;Rpl19;Rpl22l1;Rps8;Rpl3;Rps23;Rpl8;Rpl4...
1 prerank GOBP_TRICARBOXYLIC_ACID_CYCLE -0.624419 -2.374058 0.0 0.002579 0.004 9/27 6.50% Idh2;Sdhd;Suclg2;Dlat;Idh3a;Mrps36;Dlst;Pdhb;Cs
2 prerank GOBP_DNA_REPLICATION_INITIATION -0.57303 -2.36083 0.0 0.00172 0.004 12/35 11.85% Mcm3;Mcm5;Topbp1;Pola1;Prim2;Mcm2;Ccne2;Mcm6;M...
3 prerank GOBP_DNA_UNWINDING_INVOLVED_IN_DNA_REPLICATION -0.667623 -2.270939 0.0 0.006965 0.026 12/20 17.51% Mcm3;Mcm5;Gins2;Blm;Mcm2;Mcm6;Twnk;Cdc45;Gins4...
4 prerank GOBP_DNA_DEPENDENT_DNA_REPLICATION -0.4078 -2.236996 0.0 0.00908 0.039 57/144 21.58% Mcm3;Mcm5;Topbp1;Pola1;Gins2;Blm;Eme1;Rfc1;Cdk...
filter_res.sort_values('NES').head(20)
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
0 prerank GOBP_CYTOPLASMIC_TRANSLATION -0.449215 -2.385238 0.0 0.004127 0.004 58/135 20.85% Eif2s2;Rpl19;Rpl22l1;Rps8;Rpl3;Rps23;Rpl8;Rpl4...
1 prerank GOBP_TRICARBOXYLIC_ACID_CYCLE -0.624419 -2.374058 0.0 0.002579 0.004 9/27 6.50% Idh2;Sdhd;Suclg2;Dlat;Idh3a;Mrps36;Dlst;Pdhb;Cs
2 prerank GOBP_DNA_REPLICATION_INITIATION -0.57303 -2.36083 0.0 0.00172 0.004 12/35 11.85% Mcm3;Mcm5;Topbp1;Pola1;Prim2;Mcm2;Ccne2;Mcm6;M...
3 prerank GOBP_DNA_UNWINDING_INVOLVED_IN_DNA_REPLICATION -0.667623 -2.270939 0.0 0.006965 0.026 12/20 17.51% Mcm3;Mcm5;Gins2;Blm;Mcm2;Mcm6;Twnk;Cdc45;Gins4...
4 prerank GOBP_DNA_DEPENDENT_DNA_REPLICATION -0.4078 -2.236996 0.0 0.00908 0.039 57/144 21.58% Mcm3;Mcm5;Topbp1;Pola1;Gins2;Blm;Eme1;Rfc1;Cdk...
5 prerank GOBP_DOUBLE_STRAND_BREAK_REPAIR_VIA_BREAK_INDU... -0.78386 -2.145294 0.0 0.036972 0.174 10/11 20.09% Mcm3;Mcm5;Gins2;Mcm2;Mcm6;Cdc45;Gins4;Mcm7;Cdc...
6 prerank GOBP_NEGATIVE_REGULATION_OF_RNA_SPLICING -0.572831 -2.084832 0.0 0.069129 0.338 10/25 13.52% Srsf4;Ptbp1;Hnrnpa2b1;Srsf7;C1qbp;Sap18;Rps26;...
7 prerank GOBP_DNA_REPLICATION -0.351203 -2.077697 0.0 0.066808 0.37 99/253 23.39% Mcm3;Mcm5;Set;Topbp1;Pola1;Nasp;Gins2;Blm;Rrm2...
8 prerank GOBP_CHROMATIN_REMODELING_AT_CENTROMERE -0.762886 -2.074276 0.0 0.062136 0.383 5/9 6.65% Hells;Nasp;Hjurp;Mis18a;Oip5
9 prerank GOBP_DNA_GEOMETRIC_CHANGE -0.407036 -2.048352 0.0 0.07728 0.47 30/85 15.53% Mcm3;Mcm5;Gins2;Blm;Dscc1;Hnrnpa2b1;Brip1;Mcm2...
10 prerank GOBP_HISTONE_EXCHANGE -0.635976 -2.041671 0.0 0.077008 0.495 5/16 6.65% Nasp;Hjurp;Spty2d1;Mis18a;Oip5
11 prerank GOBP_ATRIAL_CARDIAC_MUSCLE_CELL_TO_AV_NODE_CEL... -0.680813 -2.04045 0.0 0.072053 0.503 9/13 20.41% Cacna1c;Cacnb2;Trpm4;Gjc1;Gja1;Scn3b;Kcna5;Kcn...
12 prerank GOBP_SNORNA_LOCALIZATION -0.889979 -2.026537 0.0 0.077225 0.564 5/6 9.18% Fbl;Nop58;Pih1d1;Prpf31;Znhit3
13 prerank GOBP_DERMATAN_SULFATE_PROTEOGLYCAN_METABOLIC_P... -0.838072 -2.022415 0.0 0.07532 0.582 6/7 15.40% B3gat3;Idua;Chst12;Dsel;Csgalnact2;Chst14
15 prerank GOBP_CENP_A_CONTAINING_CHROMATIN_ORGANIZATION -0.82405 -1.995891 0.0 0.095268 0.679 4/7 6.65% Nasp;Hjurp;Mis18a;Oip5
16 prerank GOBP_POSITIVE_REGULATION_OF_PATTERN_RECOGNITIO... -0.522305 -1.99507 0.0 0.090152 0.681 17/26 27.00% Usp15;Rtn4;Ankrd17;Ninj1;Tirap;Hmgb1;Cav1;Ifi3...
17 prerank GOBP_AEROBIC_RESPIRATION -0.364381 -1.9943 0.0 0.08582 0.686 50/147 22.85% Ndufs1;Cox5b;Ndufc2;Ide;Uqcr10;Idh2;Sdhd;Ccnb1...
18 prerank GOBP_REGULATION_OF_DNA_DIRECTED_DNA_POLYMERASE... -0.638066 -1.990986 0.0 0.083803 0.695 7/13 14.97% Gins2;Dscc1;Rfc5;Gins4;Rfc3;Pcna;Polg2
19 prerank GOBP_NEGATIVE_REGULATION_OF_MRNA_SPLICING_VIA_... -0.582963 -1.97912 0.004662 0.092806 0.749 9/20 13.52% Srsf4;Ptbp1;Hnrnpa2b1;Srsf7;C1qbp;Sap18;Sfswap...
20 prerank GOBP_DOUBLE_STRAND_BREAK_REPAIR -0.332017 -1.964585 0.0 0.105603 0.802 93/231 24.37% Mcm3;Mcm5;Pola1;Smchd1;Nabp2;Gins2;Blm;Babam1;...
filter_res.sort_values('NES').tail(20)
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
92 prerank GOBP_POSITIVE_REGULATION_OF_HAIR_FOLLICLE_DEVE... 0.725166 1.743163 0.005639 1.0 1.0 3/8 4.61% Tgfb2;Gal;Krt17
86 prerank GOBP_REGULATION_OF_PROTEIN_LOCALIZATION_TO_CEN... 0.6737 1.750461 0.010949 1.0 1.0 6/10 11.80% Bicd1;Cep250;Mark4;Nup62;Pard6a;Ubxn2b
83 prerank GOBP_PLANAR_CELL_POLARITY_PATHWAY_INVOLVED_IN_... 0.661712 1.751047 0.015652 1.0 1.0 6/11 16.61% Dvl3;Sfrp2;Fzd1;Dvl1;Nphp3;Sfrp1
77 prerank GOBP_OLFACTORY_LOBE_DEVELOPMENT 0.498639 1.761532 0.006319 1.0 1.0 10/30 12.64% Sall1;Eomes;Efna2;Slit1;Chd7;Srf;Erbb4;Robo1;R...
74 prerank GOBP_REGULATION_OF_FATTY_ACID_BETA_OXIDATION 0.570444 1.76705 0.008606 1.0 1.0 8/17 15.92% Etfbkmt;Cnr1;Abcd2;Lonp2;Irs1;Mlycd;Tysnd1;Acacb
71 prerank GOBP_NEGATIVE_REGULATION_OF_COLLATERAL_SPROUTING 0.731298 1.773559 0.007707 1.0 1.0 6/8 16.75% Dcc;Ptprs;Spp1;Ulk1;Epha7;Ulk2
68 prerank GOBP_ANTERIOR_POSTERIOR_AXON_GUIDANCE 0.756514 1.780581 0.0 1.0 1.0 3/7 1.69% Lhx1;Dcc;Lhx9
65 prerank GOBP_REGULATION_OF_NEURON_PROJECTION_ARBORIZATION 0.621632 1.78363 0.003656 1.0 1.0 6/13 18.11% Dvl3;Dlg4;Macf1;Dvl1;Map3k13;Ntng1
62 prerank GOBP_NON_CANONICAL_WNT_SIGNALING_PATHWAY_VIA_M... 0.807137 1.796288 0.001919 1.0 1.0 3/6 0.70% Wnt4;Dvl3;Nkd1
58 prerank GOBP_FEEDING_BEHAVIOR 0.448003 1.800532 0.002899 1.0 1.0 15/53 12.48% Helt;Dach1;Cck;Cnr1;Lepr;Ttc21b;Npy;Gal;Fyn;Po...
57 prerank GOBP_SPINAL_CORD_ASSOCIATION_NEURON_DIFFERENTI... 0.802871 1.804829 0.003683 1.0 1.0 4/6 10.59% Lhx1;Ascl1;Gsx1;Tal1
51 prerank GOBP_METANEPHRIC_NEPHRON_DEVELOPMENT 0.544978 1.826752 0.001715 1.0 0.999 10/26 18.22% Lhx1;Sall1;Wnt4;Tcf21;Pkd2;Tfap2b;Agtr2;Sox9;L...
45 prerank GOBP_CALCIUM_DEPENDENT_CELL_CELL_ADHESION_VIA_... 0.545783 1.843529 0.001623 1.0 0.999 12/23 19.58% Nlgn1;Arvcf;Cdh2;Cdh8;Dchs1;Atp2c1;Cdh9;Cdh13;...
44 prerank GOBP_DORSAL_SPINAL_CORD_DEVELOPMENT 0.713258 1.851775 0.001795 1.0 0.998 5/10 16.04% Lhx1;Ascl1;Gsx1;Tal1;Prox1
43 prerank GOBP_POSITIVE_REGULATION_OF_PROTEIN_LOCALIZATI... 0.793517 1.855019 0.005455 1.0 0.998 5/7 9.18% Bicd1;Cep250;Mark4;Nup62;Pard6a
40 prerank GOBP_GLYCOSIDE_METABOLIC_PROCESS 0.681614 1.858309 0.003676 1.0 0.996 6/12 16.68% Gla;Gba2;Fuca2;Cbr4;Abhd10;Fuca1
38 prerank GOBP_DIGESTIVE_SYSTEM_DEVELOPMENT 0.429413 1.87548 0.0 1.0 0.983 35/94 21.07% Cdkn1c;Shox2;Ascl1;Sall1;Hlx;Tcf21;Hip1r;Gata4...
36 prerank GOBP_AUTOPHAGY_OF_NUCLEUS 0.63459 1.879821 0.001695 1.0 0.98 8/15 18.20% Atg9a;Atg7;Atg2a;Ulk1;Rb1cc1;Ulk2;Wipi2;Wdr45
27 prerank GOBP_GLYCOSIDE_CATABOLIC_PROCESS 0.778249 1.928485 0.0 1.0 0.887 5/8 16.68% Gla;Gba2;Fuca2;Abhd10;Fuca1
14 prerank GOBP_GOLGI_TO_LYSOSOME_TRANSPORT 0.763518 2.009531 0.0 0.926462 0.553 4/10 7.08% Ehd3;Ccdc91;Sort1;Rbsn
plot_data = filter_res.sort_values('NES').tail(20)
plot_data['Term'] = [x.split('_',1)[1].replace('_', ' ').capitalize() for x in plot_data['Term']]

gp.dotplot(plot_data,column="NOM p-val",top_term=20)
/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/gseapy/plot.py:738: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[self.colname] = df[self.colname].replace(0, np.nan).bfill()
<Axes: xlabel='NES'>
../_images/53d1ed28b040193d16dcf4d4617c89999cf3714a7d485109f0ad36e85b962050.png
plot_data = filter_res.sort_values('NES').head(20)
plot_data['Term'] = [x.split('_',1)[1].replace('_', ' ').capitalize() for x in plot_data['Term']]

gp.dotplot(plot_data,column="NOM p-val",top_term=20)
/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/gseapy/plot.py:738: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[self.colname] = df[self.colname].replace(0, np.nan).bfill()
<Axes: xlabel='NES'>
../_images/f12d762a8128e85effd806282dcbac6492915c6a0e81eeb5a9e4643b106d6923.png

A negative NES indicates that genes associated with a given Gene Ontology (GO) term are highly expressed at the start of the trajectory (i.e., cells with low NT scores). For example, enrichment of DNA replication–related terms aligns with the potential of these cells to be preserved as RGCs in the next developmental stage (E16.5).

A positive NES indicates that GO term–associated genes are highly expressed at the end of the trajectory (i.e., cells with high NT scores). Enrichment of terms related to neuron differentiation and CNS development corresponds to the potential differentiation of RGCs into mature neuronal states by E16.5.

Regulon activity changes along spatial trajectory

To gain mechanistic insights, we performed gene regulatory network (GRN) analysis using the SCENIC workflow and explore regulon activity changes along spatial trajectory.

E14_RGC_regulon_aucell_df = regulon_aucell_df.loc[target_cells.index]
E14_RGC_regulon_aucell_df = E14_RGC_regulon_aucell_df.join(1-ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])
E14_RGC_regulon_aucell_df.head()
Ahr Alx1 Alx3 Alx4 Ar Arid3a Arntl2 Arx Atf1 Atf3 ... Zfp874b Zfp941 Zfp944 Zfp979 Zic1 Zkscan14 Zkscan16 Zscan20 Zxdc Cell_NTScore
Cell_ID
E14_E1S3_171289 0.0 0.015500 0.014530 0.000000 0.0 0.003071 0.000000 0.072864 0.0 0.156601 ... 0.001970 0.000000 0.0 0.0 0.026855 0.056960 0.023259 0.0 0.0 0.105795
E14_E1S3_171863 0.0 0.018301 0.017757 0.032068 0.0 0.003840 0.000000 0.000000 0.0 0.169815 ... 0.004407 0.000000 0.0 0.0 0.092449 0.063594 0.026734 0.0 0.0 0.020465
E14_E1S3_171967 0.0 0.011018 0.009434 0.000000 0.0 0.001884 0.000000 0.071136 0.0 0.137201 ... 0.018034 0.085787 0.0 0.0 0.033001 0.046866 0.017807 0.0 0.0 0.019447
E14_E1S3_171983 0.0 0.020709 0.000000 0.003596 0.0 0.000000 0.020299 0.099791 0.0 0.034300 ... 0.000000 0.000000 0.0 0.0 0.054828 0.000000 0.000000 0.0 0.0 0.041343
E14_E1S3_172013 0.0 0.009773 0.008186 0.000000 0.0 0.001552 0.228530 0.000000 0.0 0.131016 ... 0.000000 0.000000 0.0 0.0 0.045381 0.043549 0.016186 0.0 0.0 0.023282

5 rows × 305 columns

E14_RGC_regulon_aucell_metacell_data_df = construct_meta_cell_along_trajectory(meta_data_df = E14_RGC_regulon_aucell_df,
                                                                trajectory = 'Cell_NTScore',
                                                                n_cells = 10)
E14_RGC_regulon_aucell_metacell_data_df.head()
Ahr Alx1 Alx3 Alx4 Ar Arid3a Arntl2 Arx Atf1 Atf3 ... Zfp874b Zfp941 Zfp944 Zfp979 Zic1 Zkscan14 Zkscan16 Zscan20 Zxdc Cell_NTScore
Cell_ID
E14_E1S3_173789 0.0 0.012552 0.019366 0.011585 0.000000 0.014612 0.004832 0.026460 0.0 0.138414 ... 0.019770 0.0 0.0 0.000000 0.079209 0.051178 0.022155 0.0 0.0 0.000230
E14_E1S3_173259 0.0 0.019238 0.021671 0.014015 0.000000 0.003680 0.004246 0.011797 0.0 0.111139 ... 0.017413 0.0 0.0 0.000000 0.055038 0.033917 0.016856 0.0 0.0 0.000240
E14_E1S3_174106 0.0 0.010374 0.022217 0.004985 0.014738 0.008334 0.023277 0.013118 0.0 0.117493 ... 0.002832 0.0 0.0 0.015804 0.068224 0.063113 0.023028 0.0 0.0 0.000250
E14_E1S3_173417 0.0 0.024025 0.014273 0.005424 0.000000 0.005799 0.005877 0.000000 0.0 0.157988 ... 0.002603 0.0 0.0 0.000000 0.046749 0.056830 0.028127 0.0 0.0 0.000266
E14_E1S3_173284 0.0 0.016657 0.011069 0.011577 0.014097 0.004353 0.021781 0.049954 0.0 0.122188 ... 0.008053 0.0 0.0 0.019991 0.072346 0.039267 0.028010 0.0 0.0 0.000285

5 rows × 305 columns

regulon_correlated_df = cal_features_correlation_along_trajectory(
    data_df = E14_RGC_regulon_aucell_metacell_data_df,
    trajectory = 'Cell_NTScore',
    rho_threshold=0.4,
    p_val_threshold=0.01
)
print(regulon_correlated_df.shape)
(10, 2)
/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])
regulon_correlated_df.head()
PCC P_Value
Feature
Nfatc4 0.525306 0.000090
Lhx2 0.501632 0.000206
Neurod1 0.476846 0.000464
Sox6 0.414478 0.002767
Tcf7l2 0.402664 0.003743

Diagnosis of Selected Regulons

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_regulon_aucell_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'En2',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Regulon Activity',
    ci = 95,
)
../_images/7b2210adc7603ef461a38a685fc2de226d99d362d343646e7546f9a7438149bf.png
fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_regulon_aucell_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Hes6',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Regulon Activity',
    ci = 95,
)
../_images/f0e0148d19bdd5fb2e072f22aa91f53049bac306791706f64abd4830105ff980.png

We identified several regulons whose activity scores are significantly correlated with cell-level NT scores (spatial trajectory).

Session Info

import session_info

session_info.show()
Click to view session information
-----
ONTraC              1.2.2
gseapy              1.1.5
matplotlib          3.8.4
numpy               1.26.4
pandas              2.2.1
requests            2.31.0
scipy               1.14.1
seaborn             0.13.2
session_info        1.0.0
statsmodels         0.14.4
-----
Click to view modules imported as dependencies
PIL                 10.3.0
asttokens           NA
certifi             2024.02.02
charset_normalizer  3.3.2
colorama            0.4.6
comm                0.2.2
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0.post0
debugpy             1.8.1
decorator           5.1.1
executing           2.0.1
idna                3.7
ipykernel           6.29.4
jedi                0.19.1
kiwisolver          1.4.5
matplotlib_inline   0.1.7
mpl_toolkits        NA
networkx            3.3
packaging           24.0
parso               0.8.4
patsy               0.5.6
platformdirs        4.2.1
prompt_toolkit      3.0.43
psutil              5.9.8
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.17.2
pyparsing           3.1.2
pytz                2024.1
six                 1.16.0
stack_data          0.6.3
tornado             6.4
traitlets           5.14.3
typing_extensions   NA
urllib3             2.2.1
wcwidth             0.2.13
yaml                6.0.1
zmq                 26.0.2
-----
IPython             8.23.0
jupyter_client      8.6.1
jupyter_core        5.7.2
-----
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
Linux-5.14.0-427.13.1.el9_4.x86_64-x86_64-with-glibc2.34
-----
Session information updated at 2025-08-24 12:14