Last updated: 2024-12-24

Checks: 5 1

Knit directory: proj_distal/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(12345)

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: no version control

Tracking code development and connecting the code version to the results is critical for reproducibility. To start using Git, open the Terminal and type git init in your project directory.

This project is not being versioned with Git. To obtain the full reproducibility benefits of using workflowr, please see ?wflow_start.

This page describes how to download the data and code used in this analysis, set up the project directory and rerun the analysis. I have used the workflowr package to organize the analysis and insert reproducibility information into the output documents. The packrat package has also been used to manage R package versions and dependencies.

Getting the code

All the code and outputs of analyses are available from GitHub at https://github.com/MilpiedLab/Autoreactive-CD4-T-cells-in-liver-disease. If you want to replicate the analysis you can either fork the repository and clone it or download the repository as a zipped directory.

Once you have a local copy of the repository you should see the following directory structure:

analysis/ - Contains the R Markdown documents with the various stages of analysis. These are numbered according to the order they should be run.
data/ - This directory contains the data files used in the analysis with each dataset in its own sub-directory (see Getting the data for details). Processed intermediate data files will also be placed here.
output/ - Directory for output files produced by the analyses, each analysis step has its own sub-directory.
docs/ - This directory contains the analysis website, including image files.
R/ - R scripts with custom functions used in some analysis stages.
scripts/ - Python scripts and examples of how command line tools were run.
packrat/ - Directory created by packrat that contains details of the R packages and versions used in the analyses.
README.md - README describing the project.
.Rprofile - Custom R profile for the project including set up for packrat and workflowr.
.gitignore - Details of files and directories that are excluded from the repository.
proj_distal.Rproj - RStudio project file.

Installing R packages

R Packages and dependencies for this project are managed using packrat. This should allow you to install and use the same package versions as we have used for the analysis. packrat should automatically take care of this process for you the first time that you open R in the project directory. If for some reason this does not happen you may need to run the following commands:

#install.packages("packrat")
#packrat::restore()

Note that a clean install of all the required packages can take a significant amount of time when the project is first opened.

Getting the data

The raw sequencing data from this project is available on NCBI GEO under accession numbers GSE270739, GSE269661 and GSE269525. Some pre-processing and quality control of the datasets was done to produce datasets in a form suitable for the analyses that are presented here. If you don’t want to perform the preprocessing and quality control steps yourself, the processed datasets are available from this Zenodo repository. This repository also contains intermediate files from the statistical analysis.

Once the processed data has been has been produced or downloaded it needs to be placed in the correct location. The analysis code assumes the following directory structure inside the data/ directory:

processed/ - Input processed data required to run the analyses, and output datasets after analysis. Files are named and numbered according to the figure in which the results are described.
- figure2_input_UMI.csv - csv the expression matrix containing selected cells(rows) and all genes(columns) following FB5P-seq quality control, raw data used during clustering analysis
- figure2_input_metadata.csv - csv corresponding metadata produced by FB5P-seq quality control
- figure2_output_seurat.rds - seurat object with cluster labels
- figure2_output_metadata.RData - RData metadata file with results from clustering analysis
- figure2_group_gene_markers.csv - csv csv file with gene markers
- figure4_input_seurat.rds - seurat object used for gene set score analysis
- figure4_output_seurat.rds - seurat object with gene set score analysis
- figure4_output_metadata.RData - RData metadata file with results of gene set score analysis
references/ - References mentioned during the analysis and on the website
- references.bib - BibTex file of references

Running the analysis

The analysis directory contains the following analysis files:

02-fig2.html - Reading of datasets produced using FB5P-seq, annotation of the dataset.
03-fig4.html - Reading of datasets produced using Flash-FB5P-seq (v2), selection of high-quality cells, then scoring of the dataset.
04-suppfig2.html - highlight some marker genes in cell cluster.
06-suppfig4.html - Literature-based-gene set score analysis (FB5P-seq).
07-suppfig14.html - Literature-based-gene set score analysis (Flash).

As indicated by the numbering they should be run in this order. If you want to rerun the entire analysis this can be easily done using workflowr.

# rmarkdown::render_site('analysis/')
#workflowr::wflow_build(republish = TRUE)

It is also possible to run individual stages of the analysis, either by providing the names of the file you want to run to workflowr::wflow_build() or by manually knitting the document (for example using the ‘Knit’ button in RStudio).

Caching

To avoid having to repeatably re-run long running sections of the analysis I have turned on caching in the analysis documents. However, this comes at a tradeoff with disk space, useability and (potentially but unlikely if careful) reproducibility. In most cases this should not be a problem but it is something to be aware of. In particular there is a incompatibilty with caching and workflowr that can cause images to not appear in the resulting HTML files (see this GitHub issue for more details). If you have already run part of the analysis (and therefore have a cache) and want to rerun a document the safest option is the use the RStudio ‘Knit’ button.

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8       rstudioapi_0.13  knitr_1.37       magrittr_2.0.2  
 [5] workflowr_1.7.1  R6_2.5.1         rlang_1.1.1      fastmap_1.1.0   
 [9] fansi_1.0.2      stringr_1.4.0    tools_4.1.2      xfun_0.30       
[13] utf8_1.2.2       cli_3.6.1        git2r_0.33.0     jquerylib_0.1.4 
[17] htmltools_0.5.2  ellipsis_0.3.2   rprojroot_2.0.2  yaml_2.3.5      
[21] digest_0.6.29    tibble_3.1.8     lifecycle_1.0.3  crayon_1.5.0    
[25] later_1.3.0      sass_0.4.0       vctrs_0.6.4      promises_1.2.0.1
[29] fs_1.5.2         glue_1.6.2       evaluate_0.15    rmarkdown_2.11  
[33] stringi_1.7.6    bslib_0.3.1      compiler_4.1.2   pillar_1.7.0    
[37] jsonlite_1.8.0   httpuv_1.6.5     pkgconfig_2.0.3

Getting started

Getting the code

Installing R packages

Getting the data

Running the analysis

Caching