                   Using Bio-Formats Guide by Melissa Linkert

                                  Overview
                                 ----------

This document describes various things that are useful to know when working
with Bio-Formats.  It is recommended that you obtain the Bio-Formats source
by following the directions at http://www.loci.wisc.edu/software, rather than
using an official release.  It is also recommended that you have a copy of the
JavaDocs nearby; the notes that follow will make more sense when you see the
API.

For a complete list of supported formats, see the Bio-Formats home page:
http://www.loci.wisc.edu/ome/formats.html

                              Basic File Reading
                             --------------------

Bio-Formats provides several methods for retrieving data from files in an
arbitrary (supported) format.  These methods fall into three categories: raw
pixels, core metadata, and format-specific metadata.  All methods described here
are present and documented in loci.formats.IFormatReader - it is advised that
you take a look at the source and/or JavaDoc.  In general, it is recommended
that you read files using an instance of ImageReader.  While it is possible to
work with readers for a specific format, ImageReader contains additional logic
to automatically detect the format of a file and delegate subsequent calls to
the appropriate reader.

Prior to retrieving pixels or metadata, it is necessary to call setId(String)
on the reader instance, passing in the name of the file to read.

Raw pixels are always retrieved one plane at a time.  Planes can be returned
either in a byte array, or in a java.awt.image.BufferedImage (using
openBytes(int) and openImage(int) respectively).  It is entirely
up to you which method to use, as the pixel values are always identical.
In general, BufferedImages are more convenient for viewer applications and
applications that don't need to perform computations on pixel data, while byte
arrays are better for applications that perform pixel manipulations.

Core metadata is the general term for anything that might be needed to work with
the planes in a file.  A list of core metadata fields is given below, with the
appropriate accessor method in parentheses:

- image width (getSizeX())
- image height (getSizeY())
- total number of images per file (getImageCount())
- number of slices per file (getSizeZ())
- number of timepoints per file (getSizeT())
- number of actual channels per file (getSizeC())
- number of channels per image (getRGBChannelCount())
- the ordering of the images within the file (getDimensionOrder())
- whether each image is RGB (isRGB())
- whether the pixel bytes in little-endian order (isLittleEndian())
- whether the channels in an image are interleaved (isInterleaved())
- the type of pixel data in this file (getPixelType())

All file formats are guaranteed to accurately report core metadata.

Format-specific metadata refers to any other data specified in the file - this
includes acquisition and hardware parameters, among other things.  This data
is stored internally in a java.util.Hashtable, and can be accessed in one of
two ways: individual values can be retrieved by calling
getMetadataValue(String), which gets the value of the specified key.
Alternatively, getMetadata() will return the entire Hashtable.
Note that the keys in this Hashtable are different for each format, hence the
name "format-specific metadata".

                             File Reading Extras
                            ---------------------

The previous section described how to read pixels as they are stored in the
file.  However, the native format isn't necessarily convenient, so Bio-Formats
provides a few extras to make file reading more flexible.

- There are a few "wrapper" readers (that implement IFormatReader) that take a
  reader in the constructor, and manipulate the results somehow, for
  convenience. Using them is similar to the java.io InputStream/OutputStream
  model: just layer whichever functionality you need by nesting the wrappers.
  + FileStitcher extends IFormatReader, and uses advanced pattern
    matching heuristics to group files that belong to the same dataset.
  + ChannelSeparator extends IFormatReader, and makes sure that
    all planes are grayscale - RGB images are split into 3 separate grayscale
    images.
  + ChannelMerger extends IFormatReader, and merges grayscale
    images to RGB if the number of channels is greater than 1.
  + MinMaxCalculator extends IFormatReader, and provides an API
    for retrieving the minimum and maximum pixel values for each channel.
  + DimensionSwapper extends IFormatReader, and provides an API
    for changing the dimension order of a file.
- ImageTools provides a number of methods for manipulating BufferedImages and
  primitive type arrays.  In particular, there are methods to split and merge
  channels in a BufferedImage/array, as well as converting to a specific data
  type (e.g. convert short data to byte data).

                                Writing Files
                               ---------------

The following file formats can be written using Bio-Formats:

- TIFF (uncompressed or LZW)
- JPEG
- PNG
- AVI (uncompressed)
- QuickTime (uncompressed is supported natively; additional codecs use QTJava)
- Encapsulated PostScript (EPS)

We are planning support for OME-XML in the near future.

The writer API (see loci.formats.IFormatWriter) is very similar to the reader
API, in that files are written one plane at time (rather than all at once).

All writers allow the output file to be changed before the last plane has
been written.  This allows you to write to any number of output files using
the same writer and output settings (compression, frames per second, etc.),
and is especially useful for formats that do not support multiple images per
file.

A word of warning: IFormatWriter.saveImage(Image, boolean) accepts 
generic java.awt.Images, and converts them to a BufferedImage under the hood.
The problem is that not all formats support all types of data (e.g. JPEG
does not support 16-bit data).  To prevent the possibility of corrupt or
invalid files, it is important to check that the Image you supply to saveImage()
is supported.  This can be done using the isSupportedType and getPixelTypes
methods of IFormatWriter.

Please see the Movie Stitcher (loci.apps.stitcher) for an example of how
to write files using Bio-Formats.

                    Arcane Notes and Implementation Details
                   -----------------------------------------

Following is a list of known oddities.

o IFormatWriter accepts Image objects (not just BufferedImages); yet all
  writers convert the Image to a BufferedImage.  You can still pass in a
  BufferedImage, but you are free to pass in any Image object.

o All readers have another openBytes method that takes a pre-allocated byte
  array, but there is no corresponding method for openImage.  The
  rationale behind pre-allocated byte arrays is (1) array allocation takes
  a relatively long time; and (2) pre-allocation avoids memory spikes on the
  heap.  The reason there isn't something similar for openImage (i.e., a method
  that takes a pre-allocated BufferedImage) is that it's kind of a pain to
  implement, and no one has cared so far.  If you want this method, we can work
  towards adding it.

o Importing multi-file formats (Leica LEI, PerkinElmer, FV1000 OIF, ICS, and
  Prairie TIFF) can fail if any of the files are renamed.  There are
  "best guess" heuristics in these readers, but they aren't guaranteed to work
  in general.  So please don't rename files in these formats.

o If you are working on a Macintosh, make sure that the data and resource forks
  of your image files are stored together.  Bio-Formats does not handle
  separated forks (the native QuickTime reader tries, but usually fails).

o Through specialized I/O classes, Bio-Formats is able to control the number of
  open file descriptors (in the current JVM).  Currently, the maximum is 200,
  which is lower than the default on most systems.  Side note on I/O: the
  reasoning behind writing our own I/O stuff (see
  loci.formats.RandomAccessStream) is 1) InputStreams are fast at reading
  data sequentially, but cannot do random access; 2) RandomAccessFiles are
  great for random access, but less efficient for sequential reading; 3) we
  needed RandomAccessFile-like functionality for byte arrays; 4) we wanted to
  be able to read from disk, over HTTP, and potentially other sources.  The
  result is a hybrid class that extends InputStream and implements DataInput to
  meet all of our goals.

o RLE-compressed QuickTime movies will look funny if the planes are not read
  in sequential order, since proper decoding of a particular plane can depend
  on the previous plane.
