Spatial trajectory-based analysis¶

This tutorial demonstrates how to analyze feature changes along the trajectory inferred by ONTraC.

Features may include cell type composition, gene expression, regulon activity, or any other cell-level or niche-level scores.

Load Modules¶

import numpy as np
import pandas as pd
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['ps.fonttype'] = 42
mpl.rcParams['font.sans-serif'] = 'Arial'
import matplotlib.pyplot as plt
import seaborn as sns
from pprint import pprint
%matplotlib inline

from optparse import Values

from ONTraC.analysis.data import AnaData
from ONTraC.utils import write_version_info

write_version_info()

##################################################################################

         ▄▄█▀▀██   ▀█▄   ▀█▀ █▀▀██▀▀█                   ▄▄█▀▀▀▄█
        ▄█▀    ██   █▀█   █     ██    ▄▄▄ ▄▄   ▄▄▄▄   ▄█▀     ▀
        ██      ██  █ ▀█▄ █     ██     ██▀ ▀▀ ▀▀ ▄██  ██
        ▀█▄     ██  █   ███     ██     ██     ▄█▀ ██  ▀█▄      ▄
         ▀▀█▄▄▄█▀  ▄█▄   ▀█    ▄██▄   ▄██▄    ▀█▄▄▀█▀  ▀▀█▄▄▄▄▀

                        version: 1.2.2

##################################################################################

from ONTraC.analysis.trajectory import (construct_meta_cell_along_trajectory,
                                        cal_features_correlation_along_trajectory,
                                        plot_scatter_feat_along_trajectory,
                                        plot_cell_type_composition_along_trajectory_from_anadata,
                                        plot_cell_type_composition_along_trajectory)

from scipy import stats
from statsmodels.stats.multitest import multipletests

import gseapy as gp

Dataset Explanation¶

The Stereo-seq mouse embryonic midbrain dataset was originally published by Chen, et al. The raw data could be download from the MOSTA website.

We assume that you have runned ONTraC on Stereo-seq mouse embryonic midbrain dataset according to our example tutorial.

Load Data¶

Download dataset¶

import requests

# URL of the file
url = "https://zenodo.org/records/15571644/files/Stereo_seq_data.zip"

# Local file path to save the file
file_path = "./Stereo_seq_data.zip"

try:
    # Send a GET request to the URL
    response = requests.get(url)
    response.raise_for_status()  # Check if the request was successful

    # Write the content to the file
    with open(file_path, "wb") as file:
        file.write(response.content)

    print(f"File downloaded and saved to {file_path}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

File downloaded and saved to ./Stereo_seq_data.zip

import zipfile

# Path to the zip file
zip_file_path = "Stereo_seq_data.zip"

# Directory where files will be extracted
extract_to_path = "./"

try:
    # Open the zip file
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        # Extract all files to the specified directory
        zip_ref.extractall(extract_to_path)

    print(f"Files extracted to '{extract_to_path}'")
except zipfile.BadZipFile:
    print("The file is not a valid zip file.")

Files extracted to './'

ONTraC output¶

vis_options = Values()
vis_options.NN_dir = './Stereo_seq_data/ONTraC_output/stereo_midbrain_base_NN/'
vis_options.GNN_dir = './Stereo_seq_data/ONTraC_output/stereo_midbrain_base_GNN/'
vis_options.NT_dir = './Stereo_seq_data/ONTraC_output/stereo_midbrain_base_NT/'
vis_options.reverse = True
vis_options.output = None

ana_data = AnaData(vis_options)

ana_data.meta_data_df.head()

	Sample	Cell_Type	x	y
Cell_ID
E12_E1S3_100034	E12_E1S3	Fibro	15940.0	18584.0
E12_E1S3_100035	E12_E1S3	Fibro	15942.0	18623.0
E12_E1S3_100191	E12_E1S3	Endo	15965.0	18619.0
E12_E1S3_100256	E12_E1S3	Fibro	15969.0	18717.0
E12_E1S3_100264	E12_E1S3	Fibro	15974.0	18692.0

Differentiation potency¶

The differentialtion potency was calculated using moscot.

Please refer our processing codes for details.

ot_res1 = pd.read_csv('./Stereo_seq_data/source/moscot/E14_E16_1_cm.csv.gz', index_col=0)
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E14_E1S3_loc.csv.gz',index_col = 0)
ot_res1.index = temp.index
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E16_E1S3_loc.csv.gz',index_col = 0)
ot_res1.columns = temp.index
ot_res1.head()

	...	E16_E1S3_6920	E16_E1S3_6923	E16_E1S3_6928	E16_E1S3_6931
E14_E1S3_170808	...	0.000000e+00	0.000000	0.000000e+00	0.000000e+00
E14_E1S3_170916	...	6.345486e-31	0.000154	0.000000e+00	4.531762e-26
E14_E1S3_170934	...	0.000000e+00	0.000000	2.703351e-37	0.000000e+00
E14_E1S3_171016	...	0.000000e+00	0.000000	0.000000e+00	0.000000e+00
E14_E1S3_171024	...	0.000000e+00	0.000000	0.000000e+00	0.000000e+00

5 rows × 6643 columns

ot_res2 = pd.read_csv('./Stereo_seq_data/source/moscot/E14_E16_2_cm.csv.gz', index_col=0)
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E14_E1S3_loc.csv.gz',index_col = 0)
ot_res2.index = temp.index
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E16_E2S6_loc.csv.gz',index_col = 0)
ot_res2.columns = temp.index
ot_res2.head()

	...	E16_E2S6_36348	E16_E2S6_36407	E16_E2S6_36523	E16_E2S6_36755
E14_E1S3_170808	...	0.000000	0.000000	0.000000e+00	0.000000e+00
E14_E1S3_170916	...	0.000000	0.000000	0.000000e+00	0.000000e+00
E14_E1S3_170934	...	0.000112	0.000002	9.689674e-31	8.916933e-08
E14_E1S3_171016	...	0.000000	0.000000	0.000000e+00	1.239693e-06
E14_E1S3_171024	...	0.000000	0.000000	0.000000e+00	0.000000e+00

5 rows × 5506 columns

ot_res3 = pd.read_csv('./Stereo_seq_data/source/moscot/E14_E16_3_cm.csv.gz', index_col=0)
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E14_E1S3_loc.csv.gz',index_col = 0)
ot_res3.index = temp.index
temp = pd.read_csv('./Stereo_seq_data/source/moscot/ss0_E16_E2S7_loc.csv.gz',index_col = 0)
ot_res3.columns = temp.index
ot_res3.head()

	E16_E2S7_291165	E16_E2S7_291435	E16_E2S7_291503	...
E14_E1S3_170808	1.598103e-18	7.107961e-09	1.541208e-25	...
E14_E1S3_170916	0.000000e+00	0.000000e+00	0.000000e+00	...
E14_E1S3_170934	0.000000e+00	0.000000e+00	0.000000e+00	...
E14_E1S3_171016	0.000000e+00	0.000000e+00	0.000000e+00	...
E14_E1S3_171024	0.000000e+00	0.000000e+00	0.000000e+00	...

5 rows × 7261 columns

Gene expression¶

E14_RGC_scaled_exp = pd.read_csv('./Stereo_seq_data/source/stereo_seq_E14_RGC_scaled_exp.csv.gz', index_col=0)
E14_RGC_scaled_exp.head()

	0610005C13Rik	0610006L08Rik	0610009B22Rik	0610009O20Rik	0610010F05Rik	0610010K14Rik	0610012G03Rik	0610025J13Rik	0610030E20Rik	0610031O16Rik	...	Zw10	Zwilch	Zwint	Zxdb	Zxdc	Zyg11a	Zyg11b	Zyx	Zzef1	Zzz3
Cell_ID
E14_E1S3_171289	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	3.277203	-0.321011	-0.025192	-0.088172	-0.008486	...	-0.091104	-0.068489	-0.389927	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	-0.199425
E14_E1S3_171863	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	-0.307818	-0.321011	-0.025192	-0.088172	-0.008486	...	-0.091104	-0.068489	-0.389927	-0.073764	9.345431	-0.043855	-0.223836	-0.21233	-0.148131	-0.199425
E14_E1S3_171967	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	-0.307818	-0.321011	-0.025192	-0.088172	-0.008486	...	-0.091104	-0.068489	-0.389927	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	-0.199425
E14_E1S3_171983	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	2.074143	-0.321011	-0.025192	-0.088172	-0.008486	...	-0.091104	-0.068489	-0.389927	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	-0.199425
E14_E1S3_172013	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	-0.307818	-0.321011	-0.025192	-0.088172	-0.008486	...	-0.091104	-0.068489	-0.389927	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	-0.199425

5 rows × 23944 columns

Regulon activities¶

Regulon activities were calculated using pySCENIC.

Please refer our processing codes for details.

regulon_aucell_df = pd.read_csv('./Stereo_seq_data/source/stereo_seq.auc.csv.gz', index_col=0)
regulon_aucell_df.head()

	Ahr	Alx1	Alx3	Alx4	Ar	Arid3a	Arntl2	Arx	Atf1	Atf3	...	Zfp821	Zfp874b	Zfp941	Zfp944	Zfp979	Zic1	Zkscan14	Zkscan16	Zscan20	Zxdc
Cell
E12_E1S3_100034	0.0	0.014262	0.009032	0.038966	0.0	0.001760	0.0	0.000000	0.0	0.134952	...	0.0	0.000000	0.0	0.0	0.0	0.051056	0.045712	0.017288	0.0	0.0
E12_E1S3_100035	0.0	0.017741	0.017087	0.000000	0.0	0.003681	0.0	0.000000	0.0	0.167847	...	0.0	0.003940	0.0	0.0	0.0	0.064695	0.062440	0.026075	0.0	0.0
E12_E1S3_100191	0.0	0.009400	0.009933	0.026062	0.0	0.001426	0.0	0.000000	0.0	0.129891	...	0.0	0.000000	0.0	0.0	0.0	0.088422	0.042828	0.015757	0.0	0.0
E12_E1S3_100256	0.0	0.017928	0.017399	0.000000	0.0	0.003750	0.0	0.000000	0.0	0.168128	...	0.0	0.004096	0.0	0.0	0.0	0.048256	0.062728	0.091552	0.0	0.0
E12_E1S3_100264	0.0	0.019671	0.019565	0.042947	0.0	0.004231	0.0	0.080958	0.0	0.176000	...	0.0	0.005547	0.0	0.0	0.0	0.049611	0.066910	0.028448	0.0	0.0

5 rows × 304 columns

Msigdb gene sets¶

Please refer release note for details.

import requests

# URL of the file
url = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/7.5.1/c5.go.bp.v7.5.1.symbols.gmt"

# Local file path to save the file
file_path = "./c5.go.bp.v7.5.1.symbols.gmt"

try:
    # Send a GET request to the URL
    response = requests.get(url)
    response.raise_for_status()  # Check if the request was successful

    # Write the content to the file
    with open(file_path, "wb") as file:
        file.write(response.content)

    print(f"File downloaded and saved to {file_path}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

File downloaded and saved to ./c5.go.bp.v7.5.1.symbols.gmt

Dataset overview¶

from ONTraC.analysis.cell_type import plot_spatial_cell_type_distribution_dataset_from_anadata

fig, axes = plot_spatial_cell_type_distribution_dataset_from_anadata(ana_data = ana_data,
                                                                     hue_order = ['RGC', 'GlioB', 'NeuB', 'GluNeuB', 'GluNeu', 'GABA', 'Ery', 'Endo', 'Fibro', 'Basal'])

for ax in axes:
    ax.legend(title='Cell Type', loc='upper left', bbox_to_anchor=(1,1), markerscale=2.5)
    # ax.set_aspect('equal', 'box')  # uncomment this line if you want set the x and y axis with same scaling
    # ax.set_xticks([])  # uncomment this line if you don't want to show x coordinates
    # ax.set_yticks([])  # uncomment this line if you don't want to show y coordinates
    # ax.invert_xaxis()  # uncomment this line if you want to invert x axis
    # ax.invert_yaxis()  # uncomment this line if you want to invert y axis
    
    pass

../_images/d266ba8de27a3879971c1ba5e00387f21b71ba849033e96d5601e28fba4b9010.png

Spatial distribution of NT scores¶

We visualize the spatial distribution of NT scores for each cell to illustrate the spatial trajectory.

Spatial distribution of NT scores for all cells¶

from ONTraC.analysis.spatial import plot_cell_NT_score_dataset_from_anadata

fig, axes = plot_cell_NT_score_dataset_from_anadata(ana_data)

../_images/cd7ec9ac6e045c186cdffd66258fd990e01c4b064fa007fa7b1190939ab18982.png

Spatial distribution of NT scores for E14.5 RGCs only¶

# Selecting E14.5 RGCs from the dataset
plot_meta_data = ana_data.meta_data_df[(ana_data.meta_data_df['Sample'] == 'E14_E1S3') & (ana_data.meta_data_df['Cell_Type'] == 'RGC')]
plot_meta_data.head()

	Sample	Cell_Type	x	y
Cell_ID
E14_E1S3_171289	E14_E1S3	RGC	19941.0	9116.0
E14_E1S3_171863	E14_E1S3	RGC	20024.0	9512.0
E14_E1S3_171967	E14_E1S3	RGC	20040.0	9397.0
E14_E1S3_171983	E14_E1S3	RGC	20040.0	9219.0
E14_E1S3_172013	E14_E1S3	RGC	20036.0	9352.0

plot_NT_score = ana_data.NT_score.loc[plot_meta_data.index]
plot_NT_score.head()

	x	y	Niche_NTScore	Cell_NTScore
Cell_ID
E14_E1S3_171289	19941.0	9116.0	0.862209	0.894205
E14_E1S3_171863	20024.0	9512.0	0.983229	0.979535
E14_E1S3_171967	20040.0	9397.0	0.994614	0.980553
E14_E1S3_171983	20040.0	9219.0	0.988135	0.958657
E14_E1S3_172013	20036.0	9352.0	0.994703	0.976718

# visualization

from ONTraC.analysis.spatial import plot_cell_NT_score_dataset

fig, ax = plot_cell_NT_score_dataset(meta_data_df=plot_meta_data,
                                     NT_score=plot_NT_score,
                                     reverse=ana_data.options.reverse
                                   )

../_images/50eb11f0d2dc9705d0ee4f919b01f1942d12af9ab324c68ea4501c235c78122a.png

Cell type composition along spatial trajcetory¶

Cell type composition along the spatial trajectory reflects changes in the microenvironment.

Cell type composition change for all cells¶

fig, ax = plot_cell_type_composition_along_trajectory_from_anadata(
    ana_data=ana_data,  # AnaData object
    cell_types=None,  # Column name(s) in AnaData.meta_data_df that contains the cell type information.
                      # Default is None, which means all cell types in AnaData.cell_type_codes will be used.
    agg_cell_num=100,  # Number of cells to aggregate in each bin along the trajectory. Default is 10. 1 means no aggregation.
    figsize=(8,4),  # Figure size. Default is (6, 2).
    palette=None,  # Color palette for cell types. If None, use default color palette. Keys are cell types and values are colors.
    output_file_path=None  # Path to save the figure. If None, the default path
                           # {ana_data.options.output}/lineplot_raw_cell_type_composition_along_trajectory.pdf is used. 
                           # If ana_data.options.output is also None, the figure will not be saved and the function 
                           # will return the figure and axes objects instead.
)

../_images/7758c8dc2d0e8910525172916e7ca78dff4a5a4b52813ef6fb8e0d9abad5f78a.png

We observe that the dominant cell type shifts along the spatial trajectory, from RGCs (undifferentiated cells) to NeuB, GluNeuB, and ultimately to fully differentiated cells such as GluNeu and GABA neurons.

Cell type composition change around RGCs only¶

# create data_df
data_df = ana_data.meta_data_df.copy()
data_df = data_df.join(1 - ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])
data_df = data_df.join(ana_data.cell_type_composition)

# filtering with cell type
data_df = data_df[data_df['Cell_Type'] == 'RGC']

cell_types = ana_data.cell_type_codes['Cell_Type'].values.tolist()

fig ,ax = plot_cell_type_composition_along_trajectory(
    data_df=data_df,
    trajectory='Cell_NTScore',
    cell_types=cell_types, # type: ignore
    agg_cell_num=100,
    figsize=(7,3),
    palette=None,
    output_file_path=None,
)

../_images/4eb91a114da54ad48041131cb2fe65c0806d76da299f303fec0fc6ddcf3d9595.png

In E14.5, RGCs are primarily located within RGC-dominant microenvironments. Along the spatial trajectory, the proportion of RGCs gradually decreases, while NeuB and GluNeuB cells increase sequentially. We will explore the features associated with this dynamic in the following sections.

Differentiation potency along trajectory¶

We can investigate differentiation potency by using Moscot to predict the potential offspring of RGCs in the next stage (E16.5). The Moscot output have been loaded in previous section.

Selecting RGCs from Moscot results and aggregate replicates information

target_cells = ana_data.meta_data_df[(ana_data.meta_data_df['Cell_Type'] == 'RGC') & (ana_data.meta_data_df['Sample'] == 'E14_E1S3')]

def ot_res_process(ot_res):
    ot_res = ot_res.loc[ot_res.index.isin(ana_data.meta_data_df.index),
                       ot_res.columns.isin(ana_data.meta_data_df.index)]
    top_5_indices = ot_res.apply(lambda row: row.nlargest(5).index, axis=1)
    top_5_cell_types = top_5_indices.apply(lambda x: ana_data.meta_data_df.loc[x, 'Cell_Type'])
    summary_df = top_5_cell_types.apply(lambda x: x.value_counts(), axis=1).fillna(0).astype(int)
    summary_df = summary_df.loc[target_cells.index]
    summary_df = summary_df.div(summary_df.sum(axis=1).values, axis=0)
    summary_df = summary_df.join(1-ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])
    
    return summary_df
    

summary_df_1 = ot_res_process(ot_res1)
summary_df_1.head()

	Basal	Endo	Ery	Fibro	GABA	GlioB	GluNeu	GluNeuB	NeuB	RGC	Cell_NTScore
Cell_ID
E14_E1S3_171289	0.0	0.0	0.0	0.2	0.4	0.0	0.0	0.0	0.0	0.4	0.105795
E14_E1S3_171863	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.8	0.0	0.2	0.020465
E14_E1S3_171967	0.0	0.0	0.0	0.0	0.0	0.4	0.0	0.6	0.0	0.0	0.019447
E14_E1S3_171983	0.0	0.0	0.0	0.0	0.2	0.0	0.0	0.8	0.0	0.0	0.041343
E14_E1S3_172013	0.0	0.0	0.0	0.0	0.2	0.0	0.0	0.4	0.0	0.4	0.023282

E14_RGC_metacell_diff_p_1_df = construct_meta_cell_along_trajectory(
    meta_data_df = summary_df_1,
    trajectory = 'Cell_NTScore',
    n_cells = 10)
E14_RGC_metacell_diff_p_1_df.head()

	Basal	Endo	Ery	Fibro	GABA	GlioB	GluNeu	GluNeuB	NeuB	RGC	Cell_NTScore
Cell_ID
E14_E1S3_173789	0.0	0.0	0.00	0.00	0.02	0.16	0.00	0.14	0.30	0.38	0.000230
E14_E1S3_173259	0.0	0.0	0.00	0.02	0.00	0.16	0.02	0.22	0.20	0.38	0.000240
E14_E1S3_174106	0.0	0.0	0.02	0.00	0.00	0.06	0.02	0.24	0.16	0.50	0.000250
E14_E1S3_173417	0.0	0.0	0.00	0.02	0.08	0.14	0.02	0.30	0.04	0.40	0.000266
E14_E1S3_173284	0.0	0.0	0.06	0.00	0.08	0.10	0.02	0.24	0.16	0.34	0.000285

summary_df_2 = ot_res_process(ot_res2)
E14_RGC_metacell_diff_p_2_df = construct_meta_cell_along_trajectory(
    meta_data_df = summary_df_2,
    trajectory = 'Cell_NTScore',
    n_cells = 10)
E14_RGC_metacell_diff_p_2_df.head()

	Basal	Endo	Ery	Fibro	GABA	GlioB	GluNeu	GluNeuB	NeuB	RGC	Cell_NTScore
Cell_ID
E14_E1S3_173789	0.0	0.0	0.02	0.06	0.12	0.12	0.02	0.06	0.10	0.50	0.000230
E14_E1S3_173259	0.0	0.0	0.00	0.00	0.08	0.14	0.00	0.04	0.14	0.60	0.000240
E14_E1S3_174106	0.0	0.0	0.02	0.04	0.04	0.00	0.02	0.08	0.12	0.68	0.000250
E14_E1S3_173417	0.0	0.0	0.02	0.00	0.04	0.14	0.00	0.08	0.06	0.66	0.000266
E14_E1S3_173284	0.0	0.0	0.00	0.02	0.10	0.12	0.02	0.08	0.08	0.58	0.000285

summary_df_3 = ot_res_process(ot_res3)
E14_RGC_metacell_diff_p_3_df = construct_meta_cell_along_trajectory(
    meta_data_df = summary_df_3,
    trajectory = 'Cell_NTScore',
    n_cells = 10)
E14_RGC_metacell_diff_p_3_df.head()

	Basal	Endo	Ery	Fibro	GABA	GlioB	GluNeu	GluNeuB	NeuB	RGC	Cell_NTScore
Cell_ID
E14_E1S3_173789	0.00	0.0	0.00	0.12	0.02	0.02	0.04	0.10	0.04	0.66	0.000230
E14_E1S3_173259	0.02	0.0	0.02	0.04	0.02	0.02	0.00	0.06	0.02	0.80	0.000240
E14_E1S3_174106	0.00	0.0	0.12	0.06	0.08	0.02	0.02	0.08	0.00	0.62	0.000250
E14_E1S3_173417	0.02	0.0	0.04	0.06	0.04	0.02	0.04	0.14	0.00	0.64	0.000266
E14_E1S3_173284	0.00	0.0	0.02	0.06	0.12	0.04	0.08	0.20	0.00	0.48	0.000285

E14_RGC_metacell_diff_p_1_melted_df = E14_RGC_metacell_diff_p_1_df.melt(
    id_vars='Cell_NTScore',
    value_vars=E14_RGC_metacell_diff_p_1_df.columns.tolist()[:-1],
    var_name='Cell type',
    value_name='Proportion')
E14_RGC_metacell_diff_p_1_melted_df['replicate'] = ['rep1'] * E14_RGC_metacell_diff_p_1_melted_df.shape[0]
E14_RGC_metacell_diff_p_1_melted_df.head()

	Cell_NTScore	Cell type	replicate
0	0.000230	Basal	rep1
1	0.000240	Basal	rep1
2	0.000250	Basal	rep1
3	0.000266	Basal	rep1
4	0.000285	Basal	rep1

E14_RGC_metacell_diff_p_2_melted_df = E14_RGC_metacell_diff_p_2_df.melt(
    id_vars='Cell_NTScore',
    value_vars=E14_RGC_metacell_diff_p_2_df.columns.tolist()[:-1],
    var_name='Cell type',
    value_name='Proportion')
E14_RGC_metacell_diff_p_2_melted_df['replicate'] = ['rep2'] * E14_RGC_metacell_diff_p_2_melted_df.shape[0]
E14_RGC_metacell_diff_p_2_melted_df.head()

	Cell_NTScore	Cell type	replicate
0	0.000230	Basal	rep2
1	0.000240	Basal	rep2
2	0.000250	Basal	rep2
3	0.000266	Basal	rep2
4	0.000285	Basal	rep2

E14_RGC_metacell_diff_p_3_melted_df = E14_RGC_metacell_diff_p_3_df.melt(
    id_vars='Cell_NTScore',
    value_vars=E14_RGC_metacell_diff_p_3_df.columns.tolist()[:-1],
    var_name='Cell type',
    value_name='Proportion')
E14_RGC_metacell_diff_p_3_melted_df['replicate'] = ['rep3'] * E14_RGC_metacell_diff_p_3_melted_df.shape[0]
E14_RGC_metacell_diff_p_3_melted_df.head()

	Cell_NTScore	Cell type	Proportion	replicate
0	0.000230	Basal	0.00	rep3
1	0.000240	Basal	0.02	rep3
2	0.000250	Basal	0.00	rep3
3	0.000266	Basal	0.02	rep3
4	0.000285	Basal	0.00	rep3

E14_RGC_metacell_diff_p_melted = pd.concat([E14_RGC_metacell_diff_p_1_melted_df,
                                            E14_RGC_metacell_diff_p_2_melted_df,
                                            E14_RGC_metacell_diff_p_3_melted_df])
E14_RGC_metacell_diff_p_melted.head()

	Cell_NTScore	Cell type	replicate
0	0.000230	Basal	rep1
1	0.000240	Basal	rep1
2	0.000250	Basal	rep1
3	0.000266	Basal	rep1
4	0.000285	Basal	rep1

sns.lmplot(data = E14_RGC_metacell_diff_p_melted[E14_RGC_metacell_diff_p_melted['Cell type'].isin(['RGC', 'NeuB'])],
           x = 'Cell_NTScore',
           y = 'Proportion',
           hue = 'Cell type',
           scatter_kws={'edgecolor': None},
           ci=95,
          )

<seaborn.axisgrid.FacetGrid at 0x151b22d76110>

../_images/99bf0669d162d1a93ba438dd0a89bbaa77143aec58edb8549746fa07747a207e.png

Along the spatial trajectory, the probability of RGCs maintaining their identity decreases, while the probability of their differentiation into NeuB increases.

Gene expression changes along spatial trajectory¶

Next, we explore gene expression dynamics along the spatial trajectory.

E14_RGC_gene_exp_df = E14_RGC_scaled_exp.join(1 - ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])

# The meta-cell could reduce the noise here
E14_RGC_gene_exp_metacell_data_df = construct_meta_cell_along_trajectory(
    meta_data_df = E14_RGC_gene_exp_df,
    trajectory = 'Cell_NTScore',
    n_cells = 10
)
E14_RGC_gene_exp_metacell_data_df.head()

	0610005C13Rik	0610006L08Rik	0610009B22Rik	0610009O20Rik	0610010F05Rik	0610010K14Rik	0610012G03Rik	0610025J13Rik	0610030E20Rik	0610031O16Rik	...	Zwilch	Zwint	Zxdb	Zxdc	Zyg11a	Zyg11b	Zyx	Zzef1	Zzz3	Cell_NTScore
Cell_ID
E14_E1S3_173789	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	-0.098032	0.516922	-0.025192	-0.088172	-0.008486	...	0.938360	0.371673	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	0.353277	0.000230
E14_E1S3_173259	-0.036511	-0.008535	-0.198124	-0.106792	0.117162	0.537851	-0.049493	-0.025192	-0.088172	-0.008486	...	-0.068489	0.011324	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	0.946734	-0.199425	0.000240
E14_E1S3_174106	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	-0.093338	0.008216	-0.025192	-0.088172	-0.008486	...	-0.068489	-0.117173	0.933613	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	0.234650	0.000250
E14_E1S3_173417	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	-0.064664	-0.321011	-0.025192	-0.088172	-0.008486	...	-0.068489	-0.081511	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	-0.199425	0.000266
E14_E1S3_173284	-0.036511	-0.008535	-0.198124	-0.106792	-0.239179	0.321239	-0.321011	-0.025192	-0.088172	-0.008486	...	-0.068489	-0.389927	-0.073764	-0.117644	-0.043855	-0.223836	-0.21233	-0.148131	-0.199425	0.000285

5 rows × 23945 columns

gene_correlated_df = cal_features_correlation_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    rho_threshold=0.4,
    p_val_threshold=0.01
)
print(gene_correlated_df.shape)

/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])

(109, 2)

We identified 109 genes whose expression levels are significantly correlated with cell-level NT scores. Here, we highlight six representative genes that are strongly associated with neuronal differentiation and maturation.

gene_correlated_df.head()

	PCC	P_Value
Feature
Cdkn1c	0.629617	9.666948e-07
Gm3764	0.623302	1.333598e-06
Lbh	0.618297	1.712368e-06
Bbs9	0.550067	3.502920e-05
Phf21b	0.542046	4.788367e-05

Different Filtering Parameters¶

You can also select top N genes by following command:

cal_features_correlation_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    top_n=5,
    rho_threshold=0.4,
    p_val_threshold=0.01
)

/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])

	PCC	P_Value
Feature
Cdkn1c	0.629617	9.666948e-07
Gm3764	0.623302	1.333598e-06
Lbh	0.618297	1.712368e-06
Bbs9	0.550067	3.502920e-05
Phf21b	0.542046	4.788367e-05
Mcm3	-0.453214	9.492690e-04
Eif2s2	-0.456194	8.696506e-04
Tuba1b	-0.470727	5.608145e-04
Ccnd2	-0.479827	4.218648e-04
Ppia	-0.484622	3.619282e-04

Heatmap Showing Trajectory-Associated Genes¶

filtered_metacell_gene_exp_df = E14_RGC_gene_exp_metacell_data_df.loc[:,gene_correlated_df.index]
filtered_metacell_gene_exp_df.head()

Feature	Cdkn1c	Gm3764	Lbh	Bbs9	Phf21b	Helt	Nkain3	Pgpep1	Rhbdl3	Akr1b8	...	Cd24a	Rps8	Rpl22l1	Rpl19	Phgdh	Mcm3	Eif2s2	Tuba1b	Ccnd2	Ppia
Cell_ID
E14_E1S3_173789	-0.126359	-0.152949	0.148899	-0.111262	-0.150273	-0.108984	-0.237243	-0.167096	-0.122761	-0.052675	...	-0.418015	0.575271	0.220949	0.309830	0.164429	-0.159499	0.234015	0.449034	0.433422	0.543230
E14_E1S3_173259	-0.570163	-0.152949	-0.192196	-0.111262	-0.150273	-0.108984	-0.237243	-0.167096	-0.122761	-0.052675	...	-0.251175	0.916426	0.492244	0.625080	1.361812	0.350744	0.431511	0.532895	0.642079	0.112190
E14_E1S3_174106	-0.570163	-0.152949	-0.192196	-0.111262	-0.150273	-0.108984	-0.237243	-0.167096	-0.122761	-0.052675	...	-0.207797	0.531247	-0.033359	0.499405	1.262983	1.341346	0.214287	0.608114	-0.371306	0.839393
E14_E1S3_173417	-0.570163	-0.152949	-0.192196	-0.111262	-0.150273	-0.108984	-0.237243	-0.167096	-0.122761	-0.052675	...	-0.593854	0.697930	0.274315	-0.226063	-0.240031	1.718599	-0.099978	0.528772	0.449835	0.103245
E14_E1S3_173284	-0.570163	-0.152949	-0.192196	-0.111262	-0.150273	-0.108984	-0.237243	-0.167096	-0.122761	-0.052675	...	-0.371482	0.731672	0.595815	0.640199	0.716297	1.006487	0.340025	0.822160	0.210475	0.089651

5 rows × 109 columns

fig, ax = plt.subplots(figsize=(5, 20))
sns.heatmap(filtered_metacell_gene_exp_df.iloc[::-1,].T,
            cmap='bwr',
            vmin=-2,
            vmax=2,
            cbar_kws={'label': 'Normalized Gene Expression'},
            ax=ax)
ax.set_xticks([])
ax.set_xlabel('0 -------------- NT Scores --------------> 1')
ax.set_ylabel('Genes')

Text(33.222222222222214, 0.5, 'Genes')

../_images/8a14bd6b6a54d0db538d82a02756956d8770dbc039813cf6b6ad5f5c26fadec9.png

Diagnosis of Selected Genes¶

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Ppia',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
)

../_images/804a72e4f35e0e06d94561cc53392ce404c7ceb4b564a7cb1138f4565f5a98f7.png

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Cdkn1c',
    fit_reg = True,
    annotate_pos = 'upper left',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    ci=95,  # Size of the confidence interval for the regression estimate
)

../_images/b1f8347ec6b013a5d09f708ff990de1d9fd761ecf95ff5f8d716c9b973a66fed.png

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Ccnd2',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    line_kws = {'color': 'green', 'lw': 4},  # line parameters
    ci=70,  # Size of the confidence interval for the regression estimate
)

../_images/631d7c7c078fb2498834b160fa54706230837a961eeb7693714ff3b4ffd71aeb.png

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Phf21b',
    fit_reg = True,
    annotate_pos = 'upper left',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    scatter_kws = {'color': 'orange', 'edgecolor': 'gray', 's': 50},  # scatter parameters
)

../_images/eac53cd413aa8f024034ed446769da3ca1d7d17e3364874e2651d0b9f82cf0a2.png

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Efna5',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    scatter_kws = {'color': 'purple', 'edgecolor': 'gray', 's': 50},  # scatter parameters
    line_kws = {'color': 'blue', 'lw': 4, 'ls': '--'},  # line parameters
    ci = 95,  # Size of the confidence interval for the regression estimate
    marker = '*',  # Marker to use for the scatterplot glyphs.
)

../_images/3f2a45dc72eaf4ad95bb310461ddf2ab1a38ff4f1e540446d1c75cbf2d59579c.png

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Dcc',
    fit_reg = True,
    annotate_pos = 'upper left',
    figszie = (4,3),
    ylabel = 'Normalized Gene Expression',
    scatter_kws = {'color': 'green', 's': 50},  # scatter parameters
    line_kws = {'color': 'C0', 'lw': 2, 'ls': '-.'},  # line parameters
    ci = 95,  # Size of the confidence interval for the regression estimate.
    marker = 'x',  # Marker to use for the scatterplot glyphs.
)

../_images/82914105f755e22455da9d57062f26cbb51f23c3a98b9f1ea9fed7c6ae78383c.png

Gene Set Enrichment Analysis (GSEA)¶

# keep all genes while calculate correlation
all_gene_correlation_df = cal_features_correlation_along_trajectory(
    data_df = E14_RGC_gene_exp_metacell_data_df,
    trajectory = 'Cell_NTScore',
)
print(all_gene_correlation_df.shape)

/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])

(13636, 2)

all_gene_correlation_df.head()

	PCC	P_Value
Feature
Cdkn1c	0.629617	9.666948e-07
Gm3764	0.623302	1.333598e-06
Lbh	0.618297	1.712368e-06
Bbs9	0.550067	3.502920e-05
Phf21b	0.542046	4.788367e-05

all_gene_correlation_df.tail()

	PCC	P_Value
Feature
Mcm3	-0.453214	0.000949
Eif2s2	-0.456194	0.000870
Tuba1b	-0.470727	0.000561
Ccnd2	-0.479827	0.000422
Ppia	-0.484622	0.000362

reject, qvals, _, _ = multipletests(all_gene_correlation_df['P_Value'].values, alpha=0.05, method="fdr_bh")
all_gene_correlation_df.loc[:,'FDR Q-Value'] = qvals
all_gene_correlation_df.head()

	PCC	P_Value	FDR Q-Value
Feature
Cdkn1c	0.629617	9.666948e-07	0.007783
Gm3764	0.623302	1.333598e-06	0.007783
Lbh	0.618297	1.712368e-06	0.007783
Bbs9	0.550067	3.502920e-05	0.119415
Phf21b	0.542046	4.788367e-05	0.130588

all_gene_correlation_df.tail()

	PCC	P_Value	FDR Q-Value
Feature
Mcm3	-0.453214	0.000949	0.311213
Eif2s2	-0.456194	0.000870	0.311213
Tuba1b	-0.470727	0.000561	0.275297
Ccnd2	-0.479827	0.000422	0.225432
Ppia	-0.484622	0.000362	0.224330

# corrected, only one columns should list here, otherwise, there are multiple ranks as inputs
corr_rnk = all_gene_correlation_df.iloc[:,:1]
corr_rnk.head()

	PCC
Feature
Cdkn1c	0.629617
Gm3764	0.623302
Lbh	0.618297
Bbs9	0.550067
Phf21b	0.542046

corr_rnk.tail()

	PCC
Feature
Mcm3	-0.453214
Eif2s2	-0.456194
Tuba1b	-0.470727
Ccnd2	-0.479827
Ppia	-0.484622

pre_res = gp.prerank(rnk=corr_rnk, # or rnk = rnk,
                     gene_sets='./c5.go.bp.v7.5.1.symbols.gmt',
                     threads=4,
                     min_size=5,
                     max_size=1000,
                     permutation_num=1000, # reduce number to speed up testing
                     outdir=None, # don't write to disk
                     seed=6,
                     verbose=True, # see what's going on behind the scenes
                    )

2025-08-24 12:13:10,204 [WARNING] Duplicated values found in preranked stats: 16.66% of genes
The order of those genes will be arbitrary, which may produce unexpected results.
2025-08-24 12:13:10,204 [INFO] Parsing data files for GSEA.............................
2025-08-24 12:13:10,381 [INFO] 1188 gene_sets have been filtered out when max_size=1000 and min_size=5
2025-08-24 12:13:10,382 [INFO] 6470 gene_sets used for further statistical testing.....
2025-08-24 12:13:10,382 [INFO] Start to run GSEA...Might take a while..................
2025-08-24 12:13:10,383 [INFO] Genes are converted to uppercase.
2025-08-24 12:14:15,539 [INFO] Congratulations. GSEApy runs successfully................

res = pre_res.res2d
filter_res = res[[True if x.startswith('GOBP') else False for x in res['Term']]]
filter_res = filter_res[filter_res['NOM p-val'] < 0.05]

filter_res.head()

	Name	Term	ES	NES	FDR q-val	FWER p-val	Tag %	Gene %	Lead_genes
0	prerank	GOBP_CYTOPLASMIC_TRANSLATION	-0.449215	-2.385238	0.004127	0.004	58/135	20.85%	Eif2s2;Rpl19;Rpl22l1;Rps8;Rpl3;Rps23;Rpl8;Rpl4...
1	prerank	GOBP_TRICARBOXYLIC_ACID_CYCLE	-0.624419	-2.374058	0.002579	0.004	9/27	6.50%	Idh2;Sdhd;Suclg2;Dlat;Idh3a;Mrps36;Dlst;Pdhb;Cs
2	prerank	GOBP_DNA_REPLICATION_INITIATION	-0.57303	-2.36083	0.00172	0.004	12/35	11.85%	Mcm3;Mcm5;Topbp1;Pola1;Prim2;Mcm2;Ccne2;Mcm6;M...
3	prerank	GOBP_DNA_UNWINDING_INVOLVED_IN_DNA_REPLICATION	-0.667623	-2.270939	0.006965	0.026	12/20	17.51%	Mcm3;Mcm5;Gins2;Blm;Mcm2;Mcm6;Twnk;Cdc45;Gins4...
4	prerank	GOBP_DNA_DEPENDENT_DNA_REPLICATION	-0.4078	-2.236996	0.00908	0.039	57/144	21.58%	Mcm3;Mcm5;Topbp1;Pola1;Gins2;Blm;Eme1;Rfc1;Cdk...

filter_res.sort_values('NES').head(20)

	Name	Term	ES	NES	NOM p-val	FDR q-val	FWER p-val	Tag %	Gene %	Lead_genes
0	prerank	GOBP_CYTOPLASMIC_TRANSLATION	-0.449215	-2.385238	0.0	0.004127	0.004	58/135	20.85%	Eif2s2;Rpl19;Rpl22l1;Rps8;Rpl3;Rps23;Rpl8;Rpl4...
1	prerank	GOBP_TRICARBOXYLIC_ACID_CYCLE	-0.624419	-2.374058	0.0	0.002579	0.004	9/27	6.50%	Idh2;Sdhd;Suclg2;Dlat;Idh3a;Mrps36;Dlst;Pdhb;Cs
2	prerank	GOBP_DNA_REPLICATION_INITIATION	-0.57303	-2.36083	0.0	0.00172	0.004	12/35	11.85%	Mcm3;Mcm5;Topbp1;Pola1;Prim2;Mcm2;Ccne2;Mcm6;M...
3	prerank	GOBP_DNA_UNWINDING_INVOLVED_IN_DNA_REPLICATION	-0.667623	-2.270939	0.0	0.006965	0.026	12/20	17.51%	Mcm3;Mcm5;Gins2;Blm;Mcm2;Mcm6;Twnk;Cdc45;Gins4...
4	prerank	GOBP_DNA_DEPENDENT_DNA_REPLICATION	-0.4078	-2.236996	0.0	0.00908	0.039	57/144	21.58%	Mcm3;Mcm5;Topbp1;Pola1;Gins2;Blm;Eme1;Rfc1;Cdk...
5	prerank	GOBP_DOUBLE_STRAND_BREAK_REPAIR_VIA_BREAK_INDU...	-0.78386	-2.145294	0.0	0.036972	0.174	10/11	20.09%	Mcm3;Mcm5;Gins2;Mcm2;Mcm6;Cdc45;Gins4;Mcm7;Cdc...
6	prerank	GOBP_NEGATIVE_REGULATION_OF_RNA_SPLICING	-0.572831	-2.084832	0.0	0.069129	0.338	10/25	13.52%	Srsf4;Ptbp1;Hnrnpa2b1;Srsf7;C1qbp;Sap18;Rps26;...
7	prerank	GOBP_DNA_REPLICATION	-0.351203	-2.077697	0.0	0.066808	0.37	99/253	23.39%	Mcm3;Mcm5;Set;Topbp1;Pola1;Nasp;Gins2;Blm;Rrm2...
8	prerank	GOBP_CHROMATIN_REMODELING_AT_CENTROMERE	-0.762886	-2.074276	0.0	0.062136	0.383	5/9	6.65%	Hells;Nasp;Hjurp;Mis18a;Oip5
9	prerank	GOBP_DNA_GEOMETRIC_CHANGE	-0.407036	-2.048352	0.0	0.07728	0.47	30/85	15.53%	Mcm3;Mcm5;Gins2;Blm;Dscc1;Hnrnpa2b1;Brip1;Mcm2...
10	prerank	GOBP_HISTONE_EXCHANGE	-0.635976	-2.041671	0.0	0.077008	0.495	5/16	6.65%	Nasp;Hjurp;Spty2d1;Mis18a;Oip5
11	prerank	GOBP_ATRIAL_CARDIAC_MUSCLE_CELL_TO_AV_NODE_CEL...	-0.680813	-2.04045	0.0	0.072053	0.503	9/13	20.41%	Cacna1c;Cacnb2;Trpm4;Gjc1;Gja1;Scn3b;Kcna5;Kcn...
12	prerank	GOBP_SNORNA_LOCALIZATION	-0.889979	-2.026537	0.0	0.077225	0.564	5/6	9.18%	Fbl;Nop58;Pih1d1;Prpf31;Znhit3
13	prerank	GOBP_DERMATAN_SULFATE_PROTEOGLYCAN_METABOLIC_P...	-0.838072	-2.022415	0.0	0.07532	0.582	6/7	15.40%	B3gat3;Idua;Chst12;Dsel;Csgalnact2;Chst14
15	prerank	GOBP_CENP_A_CONTAINING_CHROMATIN_ORGANIZATION	-0.82405	-1.995891	0.0	0.095268	0.679	4/7	6.65%	Nasp;Hjurp;Mis18a;Oip5
16	prerank	GOBP_POSITIVE_REGULATION_OF_PATTERN_RECOGNITIO...	-0.522305	-1.99507	0.0	0.090152	0.681	17/26	27.00%	Usp15;Rtn4;Ankrd17;Ninj1;Tirap;Hmgb1;Cav1;Ifi3...
17	prerank	GOBP_AEROBIC_RESPIRATION	-0.364381	-1.9943	0.0	0.08582	0.686	50/147	22.85%	Ndufs1;Cox5b;Ndufc2;Ide;Uqcr10;Idh2;Sdhd;Ccnb1...
18	prerank	GOBP_REGULATION_OF_DNA_DIRECTED_DNA_POLYMERASE...	-0.638066	-1.990986	0.0	0.083803	0.695	7/13	14.97%	Gins2;Dscc1;Rfc5;Gins4;Rfc3;Pcna;Polg2
19	prerank	GOBP_NEGATIVE_REGULATION_OF_MRNA_SPLICING_VIA_...	-0.582963	-1.97912	0.004662	0.092806	0.749	9/20	13.52%	Srsf4;Ptbp1;Hnrnpa2b1;Srsf7;C1qbp;Sap18;Sfswap...
20	prerank	GOBP_DOUBLE_STRAND_BREAK_REPAIR	-0.332017	-1.964585	0.0	0.105603	0.802	93/231	24.37%	Mcm3;Mcm5;Pola1;Smchd1;Nabp2;Gins2;Blm;Babam1;...

filter_res.sort_values('NES').tail(20)

	Name	Term	ES	NES	NOM p-val	FDR q-val	FWER p-val	Tag %	Gene %	Lead_genes
92	prerank	GOBP_POSITIVE_REGULATION_OF_HAIR_FOLLICLE_DEVE...	0.725166	1.743163	0.005639	1.0	1.0	3/8	4.61%	Tgfb2;Gal;Krt17
86	prerank	GOBP_REGULATION_OF_PROTEIN_LOCALIZATION_TO_CEN...	0.6737	1.750461	0.010949	1.0	1.0	6/10	11.80%	Bicd1;Cep250;Mark4;Nup62;Pard6a;Ubxn2b
83	prerank	GOBP_PLANAR_CELL_POLARITY_PATHWAY_INVOLVED_IN_...	0.661712	1.751047	0.015652	1.0	1.0	6/11	16.61%	Dvl3;Sfrp2;Fzd1;Dvl1;Nphp3;Sfrp1
77	prerank	GOBP_OLFACTORY_LOBE_DEVELOPMENT	0.498639	1.761532	0.006319	1.0	1.0	10/30	12.64%	Sall1;Eomes;Efna2;Slit1;Chd7;Srf;Erbb4;Robo1;R...
74	prerank	GOBP_REGULATION_OF_FATTY_ACID_BETA_OXIDATION	0.570444	1.76705	0.008606	1.0	1.0	8/17	15.92%	Etfbkmt;Cnr1;Abcd2;Lonp2;Irs1;Mlycd;Tysnd1;Acacb
71	prerank	GOBP_NEGATIVE_REGULATION_OF_COLLATERAL_SPROUTING	0.731298	1.773559	0.007707	1.0	1.0	6/8	16.75%	Dcc;Ptprs;Spp1;Ulk1;Epha7;Ulk2
68	prerank	GOBP_ANTERIOR_POSTERIOR_AXON_GUIDANCE	0.756514	1.780581	0.0	1.0	1.0	3/7	1.69%	Lhx1;Dcc;Lhx9
65	prerank	GOBP_REGULATION_OF_NEURON_PROJECTION_ARBORIZATION	0.621632	1.78363	0.003656	1.0	1.0	6/13	18.11%	Dvl3;Dlg4;Macf1;Dvl1;Map3k13;Ntng1
62	prerank	GOBP_NON_CANONICAL_WNT_SIGNALING_PATHWAY_VIA_M...	0.807137	1.796288	0.001919	1.0	1.0	3/6	0.70%	Wnt4;Dvl3;Nkd1
58	prerank	GOBP_FEEDING_BEHAVIOR	0.448003	1.800532	0.002899	1.0	1.0	15/53	12.48%	Helt;Dach1;Cck;Cnr1;Lepr;Ttc21b;Npy;Gal;Fyn;Po...
57	prerank	GOBP_SPINAL_CORD_ASSOCIATION_NEURON_DIFFERENTI...	0.802871	1.804829	0.003683	1.0	1.0	4/6	10.59%	Lhx1;Ascl1;Gsx1;Tal1
51	prerank	GOBP_METANEPHRIC_NEPHRON_DEVELOPMENT	0.544978	1.826752	0.001715	1.0	0.999	10/26	18.22%	Lhx1;Sall1;Wnt4;Tcf21;Pkd2;Tfap2b;Agtr2;Sox9;L...
45	prerank	GOBP_CALCIUM_DEPENDENT_CELL_CELL_ADHESION_VIA_...	0.545783	1.843529	0.001623	1.0	0.999	12/23	19.58%	Nlgn1;Arvcf;Cdh2;Cdh8;Dchs1;Atp2c1;Cdh9;Cdh13;...
44	prerank	GOBP_DORSAL_SPINAL_CORD_DEVELOPMENT	0.713258	1.851775	0.001795	1.0	0.998	5/10	16.04%	Lhx1;Ascl1;Gsx1;Tal1;Prox1
43	prerank	GOBP_POSITIVE_REGULATION_OF_PROTEIN_LOCALIZATI...	0.793517	1.855019	0.005455	1.0	0.998	5/7	9.18%	Bicd1;Cep250;Mark4;Nup62;Pard6a
40	prerank	GOBP_GLYCOSIDE_METABOLIC_PROCESS	0.681614	1.858309	0.003676	1.0	0.996	6/12	16.68%	Gla;Gba2;Fuca2;Cbr4;Abhd10;Fuca1
38	prerank	GOBP_DIGESTIVE_SYSTEM_DEVELOPMENT	0.429413	1.87548	0.0	1.0	0.983	35/94	21.07%	Cdkn1c;Shox2;Ascl1;Sall1;Hlx;Tcf21;Hip1r;Gata4...
36	prerank	GOBP_AUTOPHAGY_OF_NUCLEUS	0.63459	1.879821	0.001695	1.0	0.98	8/15	18.20%	Atg9a;Atg7;Atg2a;Ulk1;Rb1cc1;Ulk2;Wipi2;Wdr45
27	prerank	GOBP_GLYCOSIDE_CATABOLIC_PROCESS	0.778249	1.928485	0.0	1.0	0.887	5/8	16.68%	Gla;Gba2;Fuca2;Abhd10;Fuca1
14	prerank	GOBP_GOLGI_TO_LYSOSOME_TRANSPORT	0.763518	2.009531	0.0	0.926462	0.553	4/10	7.08%	Ehd3;Ccdc91;Sort1;Rbsn

plot_data = filter_res.sort_values('NES').tail(20)
plot_data['Term'] = [x.split('_',1)[1].replace('_', ' ').capitalize() for x in plot_data['Term']]

gp.dotplot(plot_data,column="NOM p-val",top_term=20)

/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/gseapy/plot.py:738: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[self.colname] = df[self.colname].replace(0, np.nan).bfill()

<Axes: xlabel='NES'>

../_images/53d1ed28b040193d16dcf4d4617c89999cf3714a7d485109f0ad36e85b962050.png

plot_data = filter_res.sort_values('NES').head(20)
plot_data['Term'] = [x.split('_',1)[1].replace('_', ' ').capitalize() for x in plot_data['Term']]

gp.dotplot(plot_data,column="NOM p-val",top_term=20)

/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/gseapy/plot.py:738: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[self.colname] = df[self.colname].replace(0, np.nan).bfill()

<Axes: xlabel='NES'>

../_images/f12d762a8128e85effd806282dcbac6492915c6a0e81eeb5a9e4643b106d6923.png

A negative NES indicates that genes associated with a given Gene Ontology (GO) term are highly expressed at the start of the trajectory (i.e., cells with low NT scores). For example, enrichment of DNA replication–related terms aligns with the potential of these cells to be preserved as RGCs in the next developmental stage (E16.5).

A positive NES indicates that GO term–associated genes are highly expressed at the end of the trajectory (i.e., cells with high NT scores). Enrichment of terms related to neuron differentiation and CNS development corresponds to the potential differentiation of RGCs into mature neuronal states by E16.5.

Regulon activity changes along spatial trajectory¶

To gain mechanistic insights, we performed gene regulatory network (GRN) analysis using the SCENIC workflow and explore regulon activity changes along spatial trajectory.

E14_RGC_regulon_aucell_df = regulon_aucell_df.loc[target_cells.index]
E14_RGC_regulon_aucell_df = E14_RGC_regulon_aucell_df.join(1-ana_data.NT_score['Cell_NTScore'] if hasattr(ana_data.options, 'reverse')
                       and ana_data.options.reverse else ana_data.NT_score['Cell_NTScore'])
E14_RGC_regulon_aucell_df.head()

	Ahr	Alx1	Alx3	Alx4	Ar	Arid3a	Arntl2	Arx	Atf1	Atf3	...	Zfp874b	Zfp941	Zfp944	Zfp979	Zic1	Zkscan14	Zkscan16	Zscan20	Zxdc	Cell_NTScore
Cell_ID
E14_E1S3_171289	0.0	0.015500	0.014530	0.000000	0.0	0.003071	0.000000	0.072864	0.0	0.156601	...	0.001970	0.000000	0.0	0.0	0.026855	0.056960	0.023259	0.0	0.0	0.105795
E14_E1S3_171863	0.0	0.018301	0.017757	0.032068	0.0	0.003840	0.000000	0.000000	0.0	0.169815	...	0.004407	0.000000	0.0	0.0	0.092449	0.063594	0.026734	0.0	0.0	0.020465
E14_E1S3_171967	0.0	0.011018	0.009434	0.000000	0.0	0.001884	0.000000	0.071136	0.0	0.137201	...	0.018034	0.085787	0.0	0.0	0.033001	0.046866	0.017807	0.0	0.0	0.019447
E14_E1S3_171983	0.0	0.020709	0.000000	0.003596	0.0	0.000000	0.020299	0.099791	0.0	0.034300	...	0.000000	0.000000	0.0	0.0	0.054828	0.000000	0.000000	0.0	0.0	0.041343
E14_E1S3_172013	0.0	0.009773	0.008186	0.000000	0.0	0.001552	0.228530	0.000000	0.0	0.131016	...	0.000000	0.000000	0.0	0.0	0.045381	0.043549	0.016186	0.0	0.0	0.023282

5 rows × 305 columns

E14_RGC_regulon_aucell_metacell_data_df = construct_meta_cell_along_trajectory(meta_data_df = E14_RGC_regulon_aucell_df,
                                                                trajectory = 'Cell_NTScore',
                                                                n_cells = 10)
E14_RGC_regulon_aucell_metacell_data_df.head()

	Ahr	Alx1	Alx3	Alx4	Ar	Arid3a	Arntl2	Arx	Atf1	Atf3	...	Zfp874b	Zfp941	Zfp944	Zfp979	Zic1	Zkscan14	Zkscan16	Zscan20	Zxdc	Cell_NTScore
Cell_ID
E14_E1S3_173789	0.0	0.012552	0.019366	0.011585	0.000000	0.014612	0.004832	0.026460	0.0	0.138414	...	0.019770	0.0	0.0	0.000000	0.079209	0.051178	0.022155	0.0	0.0	0.000230
E14_E1S3_173259	0.0	0.019238	0.021671	0.014015	0.000000	0.003680	0.004246	0.011797	0.0	0.111139	...	0.017413	0.0	0.0	0.000000	0.055038	0.033917	0.016856	0.0	0.0	0.000240
E14_E1S3_174106	0.0	0.010374	0.022217	0.004985	0.014738	0.008334	0.023277	0.013118	0.0	0.117493	...	0.002832	0.0	0.0	0.015804	0.068224	0.063113	0.023028	0.0	0.0	0.000250
E14_E1S3_173417	0.0	0.024025	0.014273	0.005424	0.000000	0.005799	0.005877	0.000000	0.0	0.157988	...	0.002603	0.0	0.0	0.000000	0.046749	0.056830	0.028127	0.0	0.0	0.000266
E14_E1S3_173284	0.0	0.016657	0.011069	0.011577	0.014097	0.004353	0.021781	0.049954	0.0	0.122188	...	0.008053	0.0	0.0	0.019991	0.072346	0.039267	0.028010	0.0	0.0	0.000285

5 rows × 305 columns

regulon_correlated_df = cal_features_correlation_along_trajectory(
    data_df = E14_RGC_regulon_aucell_metacell_data_df,
    trajectory = 'Cell_NTScore',
    rho_threshold=0.4,
    p_val_threshold=0.01
)
print(regulon_correlated_df.shape)

(10, 2)

/sc/arion/work/wangw32/conda-env/envs/ONTraC/lib/python3.11/site-packages/ONTraC/analysis/trajectory.py:102: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  rho, p_val = pearsonr(data_df[trajectory], data_df[feat])

regulon_correlated_df.head()

	PCC	P_Value
Feature
Nfatc4	0.525306	0.000090
Lhx2	0.501632	0.000206
Neurod1	0.476846	0.000464
Sox6	0.414478	0.002767
Tcf7l2	0.402664	0.003743

Diagnosis of Selected Regulons¶

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_regulon_aucell_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'En2',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Regulon Activity',
    ci = 95,
)

../_images/7b2210adc7603ef461a38a685fc2de226d99d362d343646e7546f9a7438149bf.png

fig, ax = plot_scatter_feat_along_trajectory(
    data_df = E14_RGC_regulon_aucell_metacell_data_df,
    trajectory = 'Cell_NTScore',
    feature = 'Hes6',
    fit_reg = True,
    annotate_pos = 'upper right',
    figszie = (4,3),
    ylabel = 'Regulon Activity',
    ci = 95,
)

../_images/f0e0148d19bdd5fb2e072f22aa91f53049bac306791706f64abd4830105ff980.png

We identified several regulons whose activity scores are significantly correlated with cell-level NT scores (spatial trajectory).

Session Info¶

import session_info

session_info.show()

Click to view session information

-----
ONTraC              1.2.2
gseapy              1.1.5
matplotlib          3.8.4
numpy               1.26.4
pandas              2.2.1
requests            2.31.0
scipy               1.14.1
seaborn             0.13.2
session_info        1.0.0
statsmodels         0.14.4
-----

Click to view modules imported as dependencies

PIL                 10.3.0
asttokens           NA
certifi             2024.02.02
charset_normalizer  3.3.2
colorama            0.4.6
comm                0.2.2
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0.post0
debugpy             1.8.1
decorator           5.1.1
executing           2.0.1
idna                3.7
ipykernel           6.29.4
jedi                0.19.1
kiwisolver          1.4.5
matplotlib_inline   0.1.7
mpl_toolkits        NA
networkx            3.3
packaging           24.0
parso               0.8.4
patsy               0.5.6
platformdirs        4.2.1
prompt_toolkit      3.0.43
psutil              5.9.8
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.17.2
pyparsing           3.1.2
pytz                2024.1
six                 1.16.0
stack_data          0.6.3
tornado             6.4
traitlets           5.14.3
typing_extensions   NA
urllib3             2.2.1
wcwidth             0.2.13
yaml                6.0.1
zmq                 26.0.2

-----
IPython             8.23.0
jupyter_client      8.6.1
jupyter_core        5.7.2
-----
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
Linux-5.14.0-427.13.1.el9_4.x86_64-x86_64-with-glibc2.34
-----
Session information updated at 2025-08-24 12:14