A standards-based platform for packaging, sharing, and comparing experiment results across distributed research teams. Built on the RO-Crate open standard with PROV-O provenance.
Three tools, one standards-based workflow.
Research Box replaces ad-hoc experiment data scattered across local directories with self-describing, portable archives that ensure results are interoperable, reproducible, and shareable.
CLI Tool
Create, validate, and submit RO-Crate experiment archives from the command line. Interactive archive creation with schema guidance, local inspection, and push-to-server with API key authentication.
Ingestion API
Centralised backend that validates archives against JSON Schema, extracts metadata, verifies SHA-256 checksums, and stores results. GraphQL API with presigned URL uploads.
Web Portal
Authenticated dashboard for browsing experiments, comparing results side-by-side, tracking trends across projects and institutions, uploading archives, and managing API credentials.
Archive Format
Self-describing RO-Crate archives with pluggable schemas. Each archive packages metadata, methodology, raw data, results, provenance, and integrity checksums in a single portable zip.
Built on open standards.
Every archive is self-describing, portable, and meaningful outside the Research Box ecosystem. No vendor lock-in — just open, interoperable experiment data.
RO-Crate 1.1
Research Object Packaging
The community standard for packaging research data with rich metadata. Each archive is a self-contained, machine-readable research object.
Pluggable Schemas
Domain-Specific Structure
Define the archive structure that fits your experiment type. Schemas validate methodology, data format, and metadata requirements — ensuring every submission is complete and consistent.
PROV-O
Provenance Ontology
W3C standard for describing provenance chains. Every experiment records who ran it, when, on what infrastructure, and how results were derived.
JSON Schema
Archive Validation
Strict structural validation of every archive before ingestion. Ensures experiment plans, infrastructure specs, and results conform to the required schema.
Purpose-built for reproducible, shareable experiment data.
Experiment results are too often scattered across local directories in ad-hoc formats that only their creators understand. Research Box provides a structured, validated archive format that makes results portable, comparable, and reproducible — across teams, institutions, and time.
"Research Box gives distributed teams a standards-based backbone for their experiment data — from packaging and submission through to comparison and trend analysis. It replaces ad-hoc file sharing with validated, self-describing archives."
Structured Archive Format
Each experiment is packaged as a standards-compliant RO-Crate archive containing methodology, raw data, results, infrastructure context, provenance, and SHA-256 integrity checksums. Schemas are pluggable — define the structure that fits your experiment type.
Side-by-Side Comparison
Group related experiments and compare them side-by-side. Visualise key metrics, infrastructure context, and methodology differences to understand how variables affect outcomes across experiment runs.
Cross-Project Trend Analysis
Track metrics across multiple projects and institutions over time. Aggregated summary statistics, multi-project comparisons, and filterable dashboards help teams identify patterns and measure progress.
API Key Authentication
Generate, manage, and revoke API credentials through the portal. Keys are SHA-256 hashed at rest — plaintext shown once at creation. CLI and CI/CD pipelines authenticate via API keys, environment variables, or .env files.
How it works.
From local experiment data to shared, comparable results. Research Box handles the full lifecycle from archive creation to cross-project analysis.
Create
Use the CLI to interactively create an RO-Crate archive — metadata, experiment methodology, infrastructure context, raw data, and results. The schema guides you through each required section.
Validate
Archives are validated against JSON Schema definitions and SHA-256 checksums are verified — locally before submission and again server-side on ingestion.
Submit
Push archives via CLI or drag-and-drop in the web portal. The ingestion API extracts metadata, stores the archive in S3, and indexes results in the datastore.
Compare
View experiments, compare baseline vs. refactored results side-by-side, and track energy-efficiency trends across projects and institutions over time.
Built for teams who need structured, reproducible experiment data.
Any team producing experiment data that needs to be packaged, shared, and compared across institutions can deploy Research Box with a schema tailored to their domain.
Academic Research
Research groups producing experiment data that must be reproducible, citable, and shareable. Standards-based packaging ensures results are interoperable across institutions and disciplines.
Research Consortia
Multi-institutional programmes needing shared experiment datastores. Institution-scoped access, partner-level API credentials, and a central portal for comparing results across organisations.
Performance Benchmarking
Teams running systematic performance experiments. Capture infrastructure context, test methodology, and time-series measurements in portable, validated archives that others can reproduce.
Environmental Monitoring
Research programmes tracking environmental metrics, sensor readings, and sustainability indicators. Package observation data with full provenance and infrastructure context for long-term comparability.
Clinical & Life Sciences
Trial data, lab results, and observational studies that require full audit trails, version control, and validated data structures. Meet regulatory and institutional governance requirements.
CI/CD Integration
Automate experiment submission from build pipelines. The CLI and API key system supports headless operation — create, validate, and push archives as part of your continuous integration workflow.
ITEA GreenCode: measuring the energy cost of software.
Research Box was first deployed as the experiment datastore for the ITEA GreenCode programme — a multi-institution initiative to measure and reduce software energy consumption across Europe.
The Problem
Partner institutions across Europe were each running software energy-efficiency experiments — but storing results in local directories, spreadsheets, and ad-hoc formats. Results were impossible to compare, reproduce, or aggregate across institutions.
The Solution
Research Box was deployed with a custom energy-exp-run/1.0 schema. Each experiment archive packages the software under test, experiment methodology, hardware/OS context, time-series energy measurements, results KPIs, and a full provenance chain.
The Result
A shared, standards-based datastore at data.greencode.ai where partner institutions submit validated experiments and compare energy consumption trends across projects, code versions, and infrastructure configurations.
Software experiment schema
For software experiments, the archive includes a CodeMeta descriptor — the community standard for describing software — which captures the name, version, language, licence, authors, and repository URL of the codebase under test.
Each experiment is tagged as either baseline or refactored, allowing the portal to group related runs and compare them side-by-side. This makes it straightforward to quantify the energy impact of specific code changes.
The schema also captures hardware context (CPU, memory, OS), test parameters (iterations, warm-up, sampling rate), and links everything through a PROV-O provenance chain — so every result is traceable back to who ran it, when, and on what.
Archive structure for the energy-exp-run/1.0 schema used in the GreenCode programme.
Frequently Asked Questions
Research Box supports any structured experiment that can be described by a schema — from software performance benchmarks to environmental monitoring, clinical trials, and materials testing. The archive format is built around pluggable schemas, so you define the methodology, data, and metadata structure that fits your domain. Each archive includes experiment plans, raw data, results, infrastructure context, and PROV-O provenance.
Experiments are packaged as RO-Crate zip archives using the CLI tool (ropackager create) and submitted either via the CLI (ropackager push) or drag-and-drop in the web portal. Archives are validated against JSON Schema definitions and SHA-256 checksums on both client and server before ingestion.
Authentication is email-based via AWS Cognito with custom attributes for partner institution. Each user belongs to an institution, and experiments are tagged with the submitting institution. API keys can be generated per-user for CLI and CI/CD access, with full create/revoke lifecycle management through the portal.
The exact structure depends on the schema, but a typical RO-Crate archive includes: ro-crate-metadata.jsonld (root descriptor), experiment/ (methodology and parameters), infrastructure/ (environment context), runs/ (raw measurement data), results/ (summary metrics and KPIs), provenance/provenance.jsonld (PROV-O chain), and manifest/checksums.sha256 (SHA-256 integrity verification). Software experiment schemas also include software/codemeta.jsonld for describing the codebase under test.
Yes. The CLI tool supports headless operation with API key authentication via flags, environment variables, or .env files. Archives can be created and pushed as part of automated build and test pipelines. The GraphQL API also supports direct integration for custom tooling.
The portal is built with React, Tailwind CSS, and Recharts. The backend runs on AWS Amplify Gen 2 with AppSync (GraphQL), Cognito (auth), DynamoDB (datastore), S3 (archive storage), and Lambda (serverless compute). The CLI is built in Go. Validation uses AJV (JSON Schema) with SHA-256 checksums for archive integrity.
