What is the SOMAR VDE?

The SOMAR Virtual Data Enclave (VDE) is a secure, remote desktop research environment operated by the Social Media Archive at ICPSR (SOMAR). The VDE enables researchers to work with sensitive or restricted social media datasets while maintaining the highest standards of data privacy, security, and compliance.

Supported software:

  • Programming languages: R and Python
  • Researchers can install R packages from CRAN and Python packages from PyPI
  • Analysis environments: RStudio, JupyterLab, Jupyter Notebooks
  • Statistical software: Stata (by request)
  • Qualitative analysis software: MAXQDA (by request)

Large Language Models (LLMs), Machine Learning, and AI tools:

  • The SOMAR VDE can allow a variety of tools in accordance with the ICPSR LLM policy
  • Custom and pretrained models may be available by request, following a security review by ICPSR staff
  • SOMAR staff will review and upload custom models for use in the VDE, as long as they can be used in an isolated computing environment (eg, does not require internet access to run the model) and they do not retain user-provided data
  • We support several models from HuggingFace, including:

High Performance Computing

  • The SOMAR VDE offers multiple configurations of CPU and GPU compute resources to support your data analysis needs