SCassist is an R package that utilizes a combination of statistical calculations and LLM-based insights to guide users through the complex process of single-cell RNA-seq data analysis. The package aims to provide recommendations, annotations, and interpretations, leading to efficient and insightful results.
# Install the devtools package if you don't have it
install.packages("devtools")
# Install SCassist from GitHub
devtools::install_github("NIH-NEI/SCassist")
LLM Server Setup:
# Install rollama package to use the local ollama llm server
install.packages("rollama")
# Download the model (in R)
pull_model("llama3.1")
Download example data: NK, CD4+ and CD8+ T cells from LCMV infected Ifng - CTCF binding site mutant mice - GSM6625298_scRNA_LCMV_Day4_CD4_CD8_NK_WT_filtered_feature_bc_matrix.h5
# Load the SCassist and Seurat packages
library(SCassist)
library(Seurat)
# Load the downloaded example file
KO <- Read10X_h5("GSM6625298_scRNA_LCMV_Day4_CD4_CD8_NK_WT_filtered_feature_bc_matrix.h5", use.names = T)
# Create seurat object
KO <- CreateSeuratObject(counts = KO[["Gene Expression"]], names.field = 2,names.delim = "\\-")
# Set api_key_file variable
api_key_file = "api_key_from_google.txt"
# Recommend quality control filters using Gemini (online)
qc_recommendations <- SCassist_analyze_quality("KO", llm_server="google", api_key_file = api_key_file)
# Recommend quality control filters using Llama3 (local)
qc_recommendations <- SCassist_analyze_quality("KO", llm_server="ollama")
# ...and many more functions!
Step-by-step tutorials documenting the full workflow for the example datasets are provided below:
Old Seurat Workflows, for comparison:
The below workflows are the original, standard workflow versions. We used these old versions to evaluate our new SCassist based workflow.
Detailed documentation for each function, including parameters, usage, and expected outputs, is available through the ?
help function in R. For example, run ?SCassist to known about all the included functions, run ?SCassist_analyze_quality to learn about the syntax, parameters, expected inputs, defaults and outputs about the function that analyzes the quality of your single cell data and recommends filtering options.
The license for this package can be found in the LICENSE
file within the package directory.