# Reference signatures (hard-coded, cell type hierarchy)¶

## Compute per cluster average expression (average RNA counts)¶

cell2location.cluster_averages.cluster_averages.compute_cluster_averages(adata, labels, use_raw=True, layer=None)[source]

Compute average expression of each gene in each cluster

Parameters
• adata – AnnData object of reference single-cell dataset

• labels – Name of adata.obs column containing cluster labels

• use_raw – Use raw slow in adata.

• layer – Use layer in adata, provide layer name.

Returns

Return type

pd.DataFrame of cluster average expression of each gene

cell2location.cluster_averages.cluster_averages.get_cluster_variances(adata, labels, use_raw=True, layer=None)[source]

Compute variance of each gene in each cluster

Parameters
• labels – Name of adata.obs column containing cluster labels

• use_raw – Use raw slow in adata.

• layer – Use layer in adata, provide layer name.

Returns

Return type

pd.DataFrame of within cluster variance of each gene

cell2location.cluster_averages.cluster_averages.get_cluster_averages_df(X, cluster_col)[source]
Parameters
• X – DataFrame with spots / cells in rows and expression dimensions in columns

• cluster_col – pd.Series object containing cluster labels

Returns

pd.DataFrame of cluster average expression of each gene

cell2location.cluster_averages.cluster_averages.get_cluster_variances_df(X, cluster_col)[source]
Parameters
• X – DataFrame with spots / cells in rows and expression dimensions in columns

• cluster_col – pd.Series object containing cluster labels

Returns

pd.DataFrame of within cluster variances of each gene

## Decompose reference signatures according to levels of cell type hierarchy¶

cell2location.cluster_averages.markers_by_hierarhy.markers_by_hierarhy(inf_aver, var_names, hierarhy_df, quantile=[0.05, 0.1, 0.2], mode='exclusive')[source]

Find which genes are expressed at which level of cell type hierarchy. Assigns expression counts for each gene to higher levels of hierarhy using estimates of average expression for the lowest level and substracts that expression from the lowest level. For example, low level annotation can be Inh_SST neurones, high level Inh neurones, very high level neurones, top level all cell types. The function can deal with any number of layers but the order needs to be carefully considered (from broad to specific).

$g_{g} = \min\limits_{f} g_{f,g}$
$g_{fn,g} = (\min\limits_{f∈fn} g_{f,g}) - g_{g}$
$...$
$g_{f3,g} = (\min\limits_{f∈f3} g_{f,g}) - ... - g_{fn,g} - g_{g}$
$g_{f2,g} = (\min\limits_{f∈f2} g_{f,g}) - g_{f3,g} - ... - g_{fn,g} - g_{g}$
$g_{f1, g} = g_{f,g} - g_{f2,g} - g_{f3,g} - ... - g_{fn,g} - g_{g}$

Here, $$g_{f,g}$$ represents average expression of each gene in each level 1 cluster. $$g_{f1,g}$$ represents average expression of each gene unique to each level 1 cluster. $$g_{f2,g}$$ represents average expression of each gene unique to each level 2 cluster. $$g_{f3,g}$$ represents average expression of each gene unique to each level 3 cluster. $$g_{fn,g}$$ represents average expression of each gene unique to each level n cluster (can be deep). $$g_{g}$$ represents average expression of each gene unique to the top level (all cells).

Parameters
• inf_aver – np.ndarray with $$g_{g,f}$$ or with $$g_{g,f,s}$$ where s represents posterior samples

• var_names – list, array or index with variable names

• hierarhy_df – pd.DataFrame that provides mapping between clusters at different levels. Index corresponds to level 1 $$f1$$, first columns to the top level, second columns to the n-th level $$fn$$, last column corresponds to the second level $$f2$$. It is crucial the order of cell types $$f$$ in hierarhy_df matches the order of cell types in axis 1 of inf_aver.

• quantile – list of posterior distribution quantiles to be computed

• mode – ‘exclusive’ or ‘tree’ mode. In ‘exclusive’ mode, the number of counts specific to each layer is computed (e.g. counts at layer 2 are excluded from layer 1). In ‘tree’ mode, children nodes inherit the expression of their parents (e.g. layer 1 countains the original counts $$g_{f,g}$$, layer 2 contains counts from all parent layers $$g_{f2,g} + g_{f3,g} + ... + g_{fn,g} + g_{g}$$.

Returns

When input is $$g_{g,f}$$ the output is pd.DataFrame with values for $$f1, f2, f3, ..., fn, all$$ where s represents posterior sample the output is a dictionary with posterior samples for $$g_{g,f1-fn+all,s}$$ and similar dataframes for ‘mean’ and quantiles of the posterior distribution (e.g. ‘q0.05’).

## Select genes based on DE (experimental)¶

cell2location.cluster_averages.select_features.select_features(adata, groupName, n_features=10000, use_raw=True, verbose=False, sc_kwargs={})[source]

#TODO Write docstring