RE: [buildcheapeeg] EDF and DDF file formats

From: John Morrison (jmorrison_at_ahc.net.au)
Date: 2002-03-12 12:43:08


> If we're working from EDF, then I'd like to see this additional option
> (this could perhaps be added later if necessary):
>
> - EDF allows only 16-bit signed integers. I would also like the
> option of storing 32-bit floats, just for future expandability
> (e.g. the 18-bit or 24-bit systems that have been discussed).
Probably be better not to play with EDF as it's a standard.
and as you've done below create our own format.
Then just create a converter.

> Thinking of the files I've been working with from Jim-M, there will
> still be data loss when converting from the serial format to this
> chunked binary format, because there is no way to represent sync loss.
> In some of Jim-M's files there is sync loss, but that isn't a reason
> to discard the whole file (especially if it is an important session).
> Perhaps having an error channel which stores just one bit per sample
> would do the job -- each bit is 0: no errors, 1: sync or other error.
Our format should take account for this.
BUT what will happen in filters when they hit this ???

> This problem could be avoided by putting all the XML in a separate
> file, so we have files in pairs -- the session description, and the
> session binary data.
Not a bad idea that is what BioGraph does and it seams like that best way to
handle things. :-)

> This would also allow a simple program (e.g. BWView) to deal with the
> raw session data without having any knowledge of XML. The XML files
> could also be collected together to put in Andreas's planned database
> system, giving both portability of session files (just copy the pair
> of files), and database capability (if that is interesting for someone
> to develop further down the line).
In the beginning we can just define a directory structure to store the data.
Maybe session directories???
That way it is a simple matter to find the files you want!

> I don't know about the XML side of the whole thing (or even whether
> XML is good for this), but for the binary bit, we can either use the
> EDF format, with most of the header fields blank, or define a
> shortened header format for our own purposes. I can define the binary
> format if this approach looks useful.

> Jim

> P.S. I might as well define a possible format for the binary file
> while we're discussing it. The following format assumes that we put
> most of the descriptive stuff into a separate file (XML based,
> perhaps).
>
> [ Giving credit where it is due, this format is based on the EDF thing
> and recent discussions, the stuff John Morrison turned up originally
> and the comments from Dave about the BioGraph format. ]
>
> All multi-byte values are stored in little-endian order. Note that
> there is no guarantee that values are aligned on 2-byte or 4-byte
> boundaries, especially in the channel data chunks.
>
> Length Value
> -------------------------------------------------------------------
> 16 "OpenEEG-1.0", padded with NULs to 16 bytes
>
> 4 Length of global header data following (integer, == 12 currently)
> 4 Number of channels (integer)
> 4 Bytes per data chunk (integer)
> 4 Duration of data chunk in seconds (32-bit float)
> ?? Additional global header data (0 bytes, currently)
>
> 4 Length of header data per channel (integer, == 12 currently)
> {
> 4 Byte-count for this channel per data chunk (integer)
> 4 Format (integer):
> 's' 2-byte little-endian signed integers
> 'f' 4-byte little-endian IEEE-*** floats
> 'e' 1-bit error values, packed 8 to a byte, b0->b7
> 4 Scaling factor for data (if appropriate, else 1.0) (32-bit float)
> ?? Additional per-channel header data (0 bytes, currently)
> } x number of channels
>
> {
> ?? data for first channel
> ?? data for second channel
> :: (etc)
> ?? data for last channel
> ?? padding if for some reason the data chunk length is
> greater than sum of channel byte-counts
> } x (data chunks repeated to end of file)
> -------------------------------------------------------------------

One thing I'd like to see is a method of adding MARKS to the file
(annotation I think it's called).
That way the "experimenter" could mark the file (press a button) during a
session to mark when events happen then come back later to those spots and
add comments (In an external file tied to those points!).

This could be as simple as a single byte giving 255 positions or 2 bytes
giving us 65534 positions (or 2) that the start of every data block. Zero
means no comment.

And another thing -
Add a byte per channel to define the type of data recorded
i.e. 1 = eeg, 2 = ECG, etc
That way a simple programs will know what is in each channel without
reading anything else.

> Features of this:
>
> - The file type is recognisable from the "OpenEEG..." header, and we
> have a version number. I suggest that programs check the major
> version number. However, different minor version number formats
> should be readable by any code that understands the major version
> format.
I like the idea!

> - The aim of the format is simply to allow the stored data to be
> unpacked into streams of scaled floating-point values. The meanings
> and ranges of those data values are not defined here.

> - If we add new "format" types, then older programs can just choose to
> skip over them without understanding them, as we are storing a
> byte-count rather than a sample-count for the per-channel data.
> This also means that programs that don't want to bother with reading
> an error channel can just skip over that without having to
> understand it.

> - The scaling factor is to allow us to store, say, 12-bit data in the
> 16-bit signed data type ('s'), but to scale it correctly to a range
> such as -1 to +1 when it is displayed or processed.
Hmm like that idea. :-)

> - There is room for expanding the format to put more data in the
> header at a later date (e.g. for later minor versions of the format)
> without breaking older programs, so long as those programs read the
> "Length of global header data following" and "Length of header data
> per channel" fields, and skip over any bytes where indicated.

> - The number of samples per chunk can be calculated from the format
> type and bytes-per-chunk value. This works for all but 'e', where
> there might be 1-7 extra bits left over at the end. However 'e'
> describes other data streams, so the meaning is clear here. (If
> really necessary we could put a "Samples per chunk" value in the
> per-channel header, but I think it would be redundant).

> - I'm not sure if IEEE 32-bit floats are really endian-dependent. But
> in any case, we can store them as they are stored on the i486 series
> of chips.
Simple what we define is what the format uses :-)

> Anything I've missed ?
>
> I can adapt the BWView "file" input code to handle this format, with a
> few modifications to the interface.

Just about to press the send button and had a thought.
I'd like to see the following info in the global header. It could be stored
externally BUT then if the external file gets separated we loose the
information.
Date/time - recording was done
Name - Of subject or some code to ref back to!
- This way you could collect several subjects and easily work out who's
file is who's
Comment - Just for a quick not about session eg "Entering REM sleep"
With this info and no other files you have all that you need to know about
any file to make use of the file!

John



This archive was generated by hypermail 2.1.4 : 2002-07-27 12:28:40 BST