Running cell2location on NanostringWTA data¶

In this notebook we we map fetal brain cell types to regions of interest (ROIs) profiled with the NanostringWTA technology, using a version of our cell2location method recommended for probe based spatial transcriptomics data. This notebook should be read after looking at the main cell2location notebooks.

Load the required modules and configure theano settings:

[1]:

import sys,os
import pickle
import anndata
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scanpy as sc
from IPython.display import Image
data_type = 'float32'
os.environ["THEANO_FLAGS"] = 'device=cuda,floatX=' + data_type + ',force_device=True' + ',dnn.enabled=False'
import cell2location

Can not use cuDNN on context None: Disabled by dnn.enabled flag
Mapped name None to device cuda: Tesla V100-SXM2-32GB (0000:89:00.0)

[6]:

# Download data:
os.mkdir('./data')
os.system('cd ./data && wget https://cell2location.cog.sanger.ac.uk/nanostringWTA/nanostringWTA_fetailBrain_AnnData_smallestExample.p')
os.system('cd ./data && wget https://cell2location.cog.sanger.ac.uk/nanostringWTA/polioudakis2019_meanExpressionProfiles.csv')

[6]:

Load a data:

[7]:

adata_wta = pickle.load(open("data/nanostringWTA_fetailBrain_AnnData_smallestExample.p", "rb" ))

In this NanostringWTA run we profiled 288 regions of interest (ROIs) spanning the full depth of the cortex and at both 19pcw and 14pcw. An example is shown in the image below, together with the cell types we expect in each region:

[8]:

Image(filename='../images/GeometricROIs.PNG')

[8]:

../_images/notebooks_cell2location_for_NanostringWTA_7_0.png

Here we load an existing matrix of average gene expression profiles for each cell type expected in our nanostringWTA data (taken from the single-cell RNAseq study Polioudakis et al., Neuron, 2019):

[11]:

meanExpression_sc = pd.read_csv("data/polioudakis2019_meanExpressionProfiles.csv", index_col=0)

We need seperate gene probes and negative probes and if available we can also supply nuclei counts. We initialize all of those here:

[12]:

counts_negativeProbes = np.asarray(adata_wta[:,np.array(adata_wta.var_names =='NegProbe-WTX').squeeze()].X)
counts_nuclei = np.asarray(adata_wta.obs['nuclei']).reshape(len(adata_wta.obs['nuclei']),1)
adata_wta = adata_wta[:,np.array(adata_wta.var_names != 'NegProbe-WTX').squeeze()]

As you can see the nuclei counts and negative probes need to be numpy arrays, but the gene probe counts are supplied as an AnnData object.

[13]:

adata_wta.raw = adata_wta

Run cell2location:

Explanations of the arguments are as follows: ‘model_name = cell2location.models.LocationModelWTA’ > Here we tell cell2location to use the NanostringWTA model, rather than the standard model. ‘use_raw’: False > extract the data from adata_wta.X and not from adata_wta.raw.X ‘Y_data’: counts_negativeProbes > we supply the negative probe information here ‘cell_number_prior’ and ‘cell_number_var_prior’: we supply information about nuclei counts here

[14]:

cell2location.run_c2l.run_cell2location(meanExpression_sc, adata_wta,
        model_name=cell2location.models.LocationModelWTA,
        train_args={'use_raw': False},
        model_kwargs={
        "Y_data" : counts_negativeProbes,
        "cell_number_prior" : {'cells_per_spot': counts_nuclei, 'factors_per_spot': 6, 'combs_per_spot': 3},
        "cell_number_var_prior" : {'cells_mean_var_ratio': 1, 'factors_mean_var_ratio': 1, 'combs_mean_var_ratio': 1}})

### Summarising single cell clusters ###
### Creating model ### - time 0.0 min

WARNING (theano.gof.cmodule): ModuleCache.refresh() Found key without dll in cache, deleting it. /nfs/users/nfs_a/aa16/.theano/compiledir_Linux-4.15--generic-x86_64-with-debian-buster-sid-x86_64-3.7.6-64/tmp17b1kxnq/key.pkl
WARNING (theano.gof.compilelock): Overriding existing lock by dead process '57693' (I am process '49779')

### Analysis name: LocationModelWTA_1experiments_16clusters_288locations_15124genes
### Training model ###

Average Loss = 5.0222e+07: 100%|██████████| 20000/20000 [03:16<00:00, 101.68it/s]
Finished [100%]: Average Loss = 5.0222e+07

[<matplotlib.lines.Line2D object at 0x1506a7fc3f50>]

Average Loss = 5.0222e+07: 100%|██████████| 20000/20000 [03:16<00:00, 101.73it/s]
Finished [100%]: Average Loss = 5.0222e+07

[<matplotlib.lines.Line2D object at 0x150674f14a90>]

/nfs/team283/aa16/software/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location-0.5-py3.7.egg/cell2location/models/base/pymc3_model.py:449: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
  plt.subplot(nrow, ncol, i + 1)
/nfs/team283/aa16/software/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location-0.5-py3.7.egg/cell2location/models/base/pymc3_model.py:450: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  plt.subplot(np.ceil(n_plots / ncol), ncol, i + 1)

### Sampling posterior ### - time 7.79 min

... storing 'panel' as categorical
... storing 'construct' as categorical
... storing 'instrument_type' as categorical
... storing 'read_pattern' as categorical
... storing 'ngs_prep' as categorical
... storing 'pcr_primer_plate' as categorical
... storing 'pcr_primer_well' as categorical
... storing 'Index 1' as categorical
... storing 'Barcode 1' as categorical
... storing 'Index 2' as categorical
... storing '\tBarcode 2' as categorical
... storing 'segment_type' as categorical
... storing 'segment' as categorical
... storing 'aoi' as categorical
... storing 'dsp_date' as categorical
... storing 'dsp_well' as categorical
... storing 'slide' as categorical
... storing 'human_sample_ID' as categorical
... storing 'slide_barcode' as categorical
... storing 'age' as categorical
... storing 'source_ID' as categorical
... storing 'Sanger_sampleID' as categorical

### Saving results ###

... storing 'AOI_type' as categorical
... storing 'Region' as categorical
... storing 'Plate' as categorical
... storing 'sample' as categorical

### Ploting results ###
[<matplotlib.lines.Line2D object at 0x1506745ffa50>]
[<matplotlib.lines.Line2D object at 0x150674602410>]
[<matplotlib.lines.Line2D object at 0x150674551090>]
[<matplotlib.lines.Line2D object at 0x1506746c5110>]
### Plotting posterior of W / cell locations ###
Some error in plotting with scanpy or `cell2location.plt.plot_factor_spatial()`
 KeyError('coords')
### Done ### - time 8.25 min

[14]:

{'sc_obs':                       End     ExDp1     ExDp2       ExM     ExM-U       ExN  \
 TSPAN6           0.088608  0.016184  0.000000  0.048768  0.026196  0.036418
 DPM1             0.059072  0.130456  0.078313  0.100489  0.101936  0.079940
 SCYL3            0.012658  0.022070  0.030120  0.021890  0.029043  0.014707
 C1orf112         0.000000  0.009809  0.012048  0.006516  0.009112  0.005103
 FGR              0.016878  0.000490  0.000000  0.000000  0.000000  0.000000
 ...                   ...       ...       ...       ...       ...       ...
 ENSG00000276144  0.000000  0.000000  0.000000  0.000102  0.000569  0.000100
 SNORD114-7       0.000000  0.000000  0.006024  0.000000  0.000000  0.000000
 ZNF965P          0.000000  0.000000  0.000000  0.000102  0.000000  0.000100
 GOLGA8K          0.000000  0.000981  0.000000  0.000102  0.000000  0.000000
 GOLGA8J          0.000000  0.000000  0.000000  0.000000  0.000000  0.000200

                     InCGE     InMGE        IP       Mic       OPC       oRG  \
 TSPAN6           0.012552  0.011144  0.089767  0.000000  0.075163  0.241299
 DPM1             0.085077  0.073900  0.113953  0.062500  0.183007  0.169374
 SCYL3            0.013947  0.017009  0.019535  0.000000  0.049020  0.035576
 C1orf112         0.007671  0.004692  0.007907  0.020833  0.026144  0.028616
 FGR              0.000000  0.000587  0.000000  0.000000  0.000000  0.000000
 ...                   ...       ...       ...       ...       ...       ...
 ENSG00000276144  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
 SNORD114-7       0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
 ZNF965P          0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
 GOLGA8K          0.000000  0.000000  0.000000  0.000000  0.000000  0.000773
 GOLGA8J          0.000000  0.000000  0.000000  0.000000  0.000000  0.000000

                       Per     PgG2M       PgS       vRG
 TSPAN6           0.043860  0.162590  0.125000  0.228659
 DPM1             0.157895  0.276259  0.270292  0.155488
 SCYL3            0.026316  0.021583  0.034903  0.034553
 C1orf112         0.017544  0.046043  0.051136  0.009146
 FGR              0.000000  0.000000  0.000000  0.000000
 ...                   ...       ...       ...       ...
 ENSG00000276144  0.000000  0.000000  0.000000  0.000000
 SNORD114-7       0.000000  0.001439  0.000000  0.000000
 ZNF965P          0.000000  0.000000  0.000000  0.000000
 GOLGA8K          0.000000  0.000000  0.000000  0.000000
 GOLGA8J          0.000000  0.000000  0.000812  0.000000

 [35543 rows x 16 columns],
 'model_name': cell2location.models.LocationModelWTA.LocationModelWTA,
 'summ_sc_data_args': {'cluster_col': 'annotation_1',
  'which_genes': 'intersect',
  'selection': 'cluster_specificity',
  'select_n': 5000,
  'select_n_AutoGeneS': 1000,
  'selection_specificity': 0.07,
  'cluster_markers_kwargs': {}},
 'train_args': {'mode': 'normal',
  'use_raw': False,
  'data_type': 'float32',
  'n_iter': 20000,
  'learning_rate': 0.005,
  'total_grad_norm_constraint': 200,
  'method': 'advi',
  'sample_prior': False,
  'n_prior_samples': 10,
  'n_restarts': 2,
  'n_type': 'restart',
  'tracking_every': 1000,
  'tracking_n_samples': 50,
  'readable_var_name_col': None,
  'sample_name_col': 'sample',
  'fact_names': None,
  'minibatch_size': None},
 'posterior_args': {'n_samples': 1000,
  'evaluate_stability_align': False,
  'mean_field_slot': 'init_1'},
 'export_args': {'path': './results',
  'plot_extension': 'png',
  'scanpy_plot_vmax': 'p99.2',
  'scanpy_plot_size': 1.3,
  'save_model': False,
  'save_spatial_plots': True,
  'run_name_suffix': '',
  'export_q05': True,
  'export_mean': False,
  'scanpy_coords_name': 'coords',
  'img_key': 'hires'},
 'run_name': 'LocationModelWTA_1experiments_16clusters_288locations_15124genes',
 'run_time': '8.23 min',
 'mod': <cell2location.models.LocationModelWTA.LocationModelWTA at 0x1506fb6ba690>}

An anndata object that has the cell2location results included is saved and can be used for further analysis, as in standard cell2location:

[2]:

adata_c2l = sc.read_h5ad('resultsLocationModelWTA_1experiments_16clusters_288locations_15124genes/sp.h5ad')

[19]:

adata_c2l.obs.loc[:,['mean_spot_factors' in c for c in adata_c2l.obs.columns]]

[19]:

	mean_spot_factorsEnd	mean_spot_factorsExDp1	mean_spot_factorsExDp2	mean_spot_factorsExM	mean_spot_factorsExM-U	mean_spot_factorsExN	mean_spot_factorsInCGE	mean_spot_factorsInMGE	mean_spot_factorsIP	mean_spot_factorsMic	mean_spot_factorsOPC	mean_spot_factorsoRG	mean_spot_factorsPer	mean_spot_factorsPgG2M	mean_spot_factorsPgS	mean_spot_factorsvRG
0	0.084664	0.189308	0.029214	0.416038	2.362057	11.295574	16.262985	7.932218	41.685520	0.742107	0.223058	0.236820	0.672671	3.205585	12.560131	13.601164
1	21.419926	7.724943	6.721371	0.333573	1.066788	0.770727	48.214741	42.975559	70.506340	8.807492	14.691536	521.252258	13.545335	49.975002	10.506875	487.543640
2	39.654678	19.112091	12.900160	21.181322	16.252541	434.558960	340.707214	129.871735	349.923279	25.703894	21.041853	290.028412	20.612944	57.256882	78.789604	84.983528
3	51.285645	8.662683	15.469158	3.503401	14.038322	896.294128	295.430023	152.729813	521.571167	30.252962	31.993980	456.301239	28.588427	86.054306	129.775269	132.642563
4	25.595606	23.515396	48.510769	188.190826	26.642292	499.026337	80.059036	41.910606	116.886253	13.355888	19.981899	98.308342	17.650003	32.258457	86.578354	41.804352
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
283	0.028876	0.035472	0.006267	0.040910	0.031838	0.053754	0.025631	0.023575	0.035022	0.027488	0.013950	0.017187	0.010829	0.009468	0.020645	0.021043
284	0.043387	0.037010	0.013416	0.059270	0.047244	0.050790	0.043571	0.042204	0.043056	0.029641	0.020657	0.024773	0.011805	0.069313	0.048026	0.035278
285	0.020425	0.013803	0.007025	0.006270	0.005928	0.015476	0.014805	0.012487	0.010647	0.018965	0.005501	0.009071	0.007763	0.003181	0.005728	0.010816
286	0.015542	0.010659	0.007960	0.008583	0.007580	0.010384	0.013900	0.010033	0.013732	0.011912	0.007894	0.009918	0.004368	0.002940	0.005661	0.007249
287	0.012233	0.011973	0.004245	0.006267	0.014237	0.012098	0.014255	0.018113	0.009493	0.012822	0.006665	0.007838	0.003509	0.004466	0.008221	0.009287

288 rows × 16 columns

We can also plot the same QC metrics as in standard cell2location:

[21]:

from IPython.display import Image
Image(filename='resultsLocationModelWTA_1experiments_16clusters_288locations_15124genes/plots/training_history_without_first_20perc.png',
      width=400)

[21]:

../_images/notebooks_cell2location_for_NanostringWTA_21_0.png

[22]:

Image(filename='resultsLocationModelWTA_1experiments_16clusters_288locations_15124genes/plots/data_vs_posterior_mean.png',
      width=400)

[22]:

../_images/notebooks_cell2location_for_NanostringWTA_22_0.png

[23]:

Image(filename='resultsLocationModelWTA_1experiments_16clusters_288locations_15124genes/plots/evaluate_stability.png',
      width=400)

[23]:

../_images/notebooks_cell2location_for_NanostringWTA_23_0.png

Finally we have this method to plot cell type abundance in 1D across our coordinate of interest (cortical depth):

It makes most sense to plot a subset, for example on one 19pcw slide at one position:

[3]:

colourCode = {'SPN': 'salmon', 'End': 'darkcyan', 'ExDp1': 'deepskyblue', 'ExDp2': 'blue', 'ExM': 'gold', 'ExM-U': 'yellow',
              'ExN': 'darkorange', 'InCGE': 'darkgrey', 'InMGE': 'dimgray', 'IP': 'darkviolet',
              'Mic': 'indianred', 'OPC': 'lightcoral', 'oRG': 'red', 'Per': 'darkgreen',
              'PgG2M': 'rebeccapurple', 'PgS': 'violet', 'vRG': 'lightgreen'}

[4]:

subset_19pcw = [adata_c2l.obs['slide'].iloc[i] == '00MU' and
         adata_c2l.obs['Radial_position'].iloc[i] == 2 for i in range(len(adata_c2l.obs['Radial_position']))]

[5]:

cell2location.plt.plot_absolute_abundances_1D(adata_c2l, subset = subset_19pcw, saving = False,
                               scaling = 0.15, power = 1, pws = [0,0,100,500,1000,3000,6000], figureSize = (12,8),
                               dimName = 'VCDepth', xlab = 'Cortical Depth', colourCode = colourCode)

../_images/notebooks_cell2location_for_NanostringWTA_28_0.png

You can also plot density rather than total numbers:

[7]:

cell2location.plt.plot_density_1D(adata_c2l, subset = subset_19pcw, saving = False,
                               scaling = 0.05, power = 1, pws = [0,0,100,500,1000,3000,6000,10000], figureSize = (12,8),
                               dimName = 'VCDepth', areaName = 'roi_dimension', xlab = 'Cortical Depth',
                colourCode = colourCode)

../_images/notebooks_cell2location_for_NanostringWTA_30_0.png

[ ]: