Data repository

OME Tuesday meeting

15th July 2015

Sébastien Besson

Bio-Formats metrics

Presentation Outline

  • Data repository structure
  • Bio-Formats automated tests
  • Future work

OME repository

Squig mounted as /ome/

Folder Description
apache_repo QA file store
ci Continuous Integration repository (reference DBs)
data_repo Data repository
Documentation Model documentation
team Miscellaneous internal content
www Static Web content (downloads, schemas)

QA file store

  • File uploaded via QA application
  • Stored under /ome/apache_repo/11378/Example_MJ.dm4

Data repository

Folder Description Symlinks
ci / config Continuous Integration no
from_* File format datasets no
from_skyking Curated datasets no
test_images_* Testing subsets of from_skyking yes
test_images_good Testing subset yes
public Public datasets yes

Curated datasets

  • Root directory /ome/data_repo/from_skyking
    • Datasets primarily organized by format
    • Subfolders contains one of multiple filesets
  • Group readable/not writeable
  • QA datasets deep copied
  • Tested daily by the Bio-Formats automated tests

Bio-Formats automated tests

  • See the Bio-Formats developer documentation
  • Ant automated-tests target
    • Scans files under root directory testng.repository
    • Scans matching configuration files under root configuration directory testng.configRepository
    • Runs series of tests on each configured fileset including metadata reading, bytes opening

Bio-Formats automated tests

Continuous Integration repository jobs

Bio-Formats automated tests

  • Full repository jobs
    • Testing the curated datasets repository
    • Both Linux and Windows
    • Running daily (or every 2 days for Windows)
    • 300K tests
  • Test images good jobs
    • Testing the test_images_good datasets repository
    • 300K tests
  • Repository subset jobs
    • Merges Bio-Formats PRs
    • Logic to test one or multiple formats
    • Primarily used for development

Bio-Formats automated tests

Run automated tests against one (or multiple) file formats

Configuration repository

  • Private repository hosting the .bioformats configuration files for the automated tests
  • Configuration files can be modified by PR workflow
  • Merged as part of the daily job suite - see DATA_REPO_CONFIG-merge
  • Tagged as part of the release process
  • Improves process during significant reader modifications (Leica LIF)

Future work

  • Data repository
    • Towards more parallelization using new IDR hardware
    • Background maintenance (curation, cleanup...)
  • Configuration repository
    • More metadata support
    • Usage for stats
    • A tool for searching data?

Thank you