Re: File validity stuff - Re: [buildcheapeeg] EDF and DDF file formats

From: Jim Peters (jim_at_uazu.net)
Date: 2002-03-14 19:28:50


Sar Saloth wrote:
> Yes, of course we wouldn't ask anyone to rewrite the firmware in
> their working devices. However, if we agree on a flexible
> communications format then it should be a simple and nicely
> self-contained bit of code with well defined interfaces to do the
> conversion. Is that a silly statement due to my lack of modern
> programming experience? - it might be. Hopefully future products
> could use the flexible method of communication.
>
> Yes, I am guilty in the above paragraph of trying to use an
> communications protocol as an interface specification between bits
> of code. Is that a bad thing? (that is a real question, not a
> rhetorical one).

I think it would be hard to design a completely universal protocol for
a serial connection, because if you look at the ProComp format, they
are carefully offsetting all the low-rate channels through the
8-packet sequence so that there is never too much data going down the
wire any one time. This is hand-optimised, obviously.

I think that designing a universal serial protocol would be a lot more
work.

However, if you are thinking of using USB or TCP/IP and a much faster
data rate, you have a lot more headroom in the data stream, so it
doesn't matter so much if your first packet in a sequence of 8
contains data for channels ABCDEFGH, and all the remaining 7 packets
only contain AB. With a serial connection, the time to send that
first packet might make all the others late, but using a much faster
connection, there isn't such a worry.

But then the question is whether your hardware can actually generate
data values for all channels A-H in one sample-period ...

Here is one idea for a 'universal' format. You could have the
hardware tag each data value with the channel it belongs to, so you
might get a data stream like this ("aa aa", "bb bb", etc. are the
signal data value bytes):

01 aa aa 02 bb bb 03 cc cc
01 aa aa 02 bb bb 04 dd dd
01 aa aa 02 bb bb 05 ee ee
01 aa aa 02 bb bb 06 ff ff
01 aa aa 02 bb bb 07 gg gg
01 aa aa 02 bb bb 08 hh hh
01 aa aa 02 bb bb 09 ii ii
01 aa aa 02 bb bb 0A jj jj

(Plus you would need all your CRCs and packet wrappers and so on).

The receiving code would look at the first byte of each three to see
which channel the data belongs to, and add it to that section of the
buffer. If it got too many '01's or not enough '06's or whatever, it
would have to assume corruption, but it should all work fine so long
as the sending hardware keeps to the rules.

This very simple idea would probably have to be extended to handle
different sized data chunks and sync bits or whatever, but perhaps
this is closer to what you actually want, as a universal format for
sending all data from your hardware.

> For reasons of dealing with FDA and regulators etc. I will
> personally keep that CRC and sequence junk etc. in my data file, but
> I understand why you don't want it (because functionally it is
> useless). Personally I think that a bit of disk space isn't much of
> an issue these days.

Yes, I can see no problem with this idea. My only problem is forcing
both formats to be the same, when probably it is better not to have
them *exactly* the same.

> As far as the annotation stream, if the device had no annotations to
> send, it couldn't send it if it wanted. If it was an event, maybe
> the event codes would be more efficient? Unfortunately, the event
> code date-time stamp still looks like it consumes a lot of
> characters. It is especially bad if we make one record be one
> sample to minimize latency as the annotations (or event stream or
> anything) will then take as much space as one channel.

I think in EDF the annotation text continues on into later data
records until all the characters have been written. So it doesn't
matter if you only allow 4 characters of annotation per data record
according to this system, because your annotation text can spread
across the next 8-10 records. The only limitation of this method is
that you can't have too many annotations or else you run out of space
to write them all (if you have them too close together, you write the
next one immediately after the current one, even if that makes it late
-- the date/time-stamp shows when it really happened).

> If the device has no annotations to write this is not an issue. If
> there are annotations, then the data rate taken in EDF is a function
> of the PEAK information rate (how quickly your annotations can
> come).

Agreed.

I'm now quoting from the other message, to save having too many
messages going at once !

> > ?? data for first channel
> > ?? data for second channel
> > :: (etc)
> > ?? data for last channel
> > ?? padding if for some reason the data chunk length is
> > greater than sum of channel byte-counts
> >} x (data chunks repeated to end of file)
>
> If each data chunk consists of one sample per channel, then nothing in the
> record depends on anything later in the record. Doesn't that mean you
> could start writing the data as soon as you got it?

Yes, of course. The only thing is that for slower-rate data streams
like temperature sensors/etc you're not going to be able to have a
one-sample-length data record.

> > Chan-A Chan-B Chan-C
> > Chan-A Chan-B Chan-D
> > Chan-A Chan-B Chan-E
> > Chan-A Chan-B Chan-F
> > Chan-A Chan-B Chan-G
> > Chan-A Chan-B Chan-H
> > Chan-A Chan-B (other data)
> > Chan-A Chan-B (other data)
> >
> >As you can see, you get 8 samples of Chan-A/B for every one sample of
> >Chan-[C-H]. Neither EDF nor my format would be any good for this (or
> >at least they would force you into using 1/10 sec chunks for
> >transmission).
>
> Can you point me to the specification? Is that a fixed order or is
> that specified in a header?

I got this out of Dave's code, and I think he based it on the docs
that came with his equipment. I'll leave it up to him if he wants to
post the link to his source files.

I believe it is a fixed order. Actually this set of 8 packets repeats
3 times over, and the "(other data)" part is different each time.

> >The idea of the format I suggested was just to keep all the fixed
> >binary stuff in one file, and all the variable text stuff in a second
> >file next to it, probably in some kind of structured text format
> >(i.e. easily editable).
>
> Yes that would be a very nice and clean way to handle annotation and
> events. What is the advantage of the header information in a
> separate file over the EDF header except for some wasted space? The
> EDF header already handles signals, labels, electrode location,
> calibration (numerical to physical) etc.. I agree that parsing it
> is a little bit of a pain compared to some nicely labeled text field
> in a structured text format.

Actually, it would be easier to code for EDF's format, because
everything is fixed, but it doesn't give anyone space to write proper
notes, or put additional information in there that wasn't already
allowed for. Also, a separate text-based file could contain an
endless number of annotations and events, with much longer
descriptions if necessary.

However, if EDF is "good enough", then all this extra flexibility
wouldn't be worth the cost of the extra code and the trouble of having
two files instead of one.

> If we keep the meanings of labels and values and constants
> consistent with EDF then translating between the two shouldn't be
> such a big deal. Those are the things that I would really like to
> keep the same as EDF.

We can certainly map from one to the other, but if we allow more space
in the text file, then we have to truncate the fields on mapping back
to EDF, and discard any extra information we have.

Jim

-- 
Jim Peters (_)/=\~/_(_) jim_at_uazu.net
(_) /=\ ~/_ (_)
Uazú (_) /=\ ~/_ (_) http://
B'ham, UK (_) ____ /=\ ____ ~/_ ____ (_) uazu.net


This archive was generated by hypermail 2.1.4 : 2002-07-27 12:28:41 BST