Re: [buildcheapeeg] EDF and DDF file formats

From: Sar Saloth (sarsaloth_at_yahoo.com)
Date: 2002-03-13 05:23:25


At 12:13 AM 2002-03-13 +0000, you wrote:
>Sar Saloth wrote:
> > As an important note, I was not thinking of implementing a full XML parser
> > with a complex nested format. It is possible to maintain XML syntax from
> > an essentially "flat" structure. I don't know the real software term for
> > flat, but I mean something that wouldn't have nested records (nodes?).
>
>I know what you mean. But it is better not to call it XML, then,
>because XML has loads of requirements that you aren't going to be
>keeping to. For example, if the character encoding was UTF-8, then
>you can't just dump binary data into the file, because that is not
>legal UTF-8. Officially, you'd need to encode it in base64 or
>something. Really you're talking about a kind of improvised tagged
>ASCII format.

Exactly, but one that might loosely look like XML. Can you think of a good
name? BXML? (bastardized or binary?)

> > I vowed in the future to leave the low-level bit-twiddling to
> > low-level people and let the programmers have simple human readable
> > text commands.
>
>I agree, I think that is a good idea.
>
> > > >- Someone will have to write a class or library to read/write/seek
> > > > around this kind of file.
> >
> > ?Standard? methods for working with XML do that. It is essentially
> > traversing a tree.
>
>Only if you are willing to drag it all into memory, though, surely ?
>I think there are ways to stream XML data without pulling it all into
>memory at once, but still we would need a lot more code to handle a
>true XML-encoded data stream than for a more simple binary format.

On a practical level - YES! - for the control and status I would definitely
only handle the ?XML? on a streamed basis (I want this to be very embedded
friendly).

>If it was just the descriptive information stored in XML, rather than
>the raw samples, then it wouldn't be such a problem to load it all
>into memory. (e.g. if we have two files: a descriptive XML file + a
>binary data file)
>
> > One of the simplifications for this problem is that usually only one
> > piece of hardware collects the high data rate information and the
> > other pieces can collect the low rate information. In this case,
> > resampling won't cause much error in the slow signals as long as the
> > re synchronizing adds only the jitter equivalent to the fast sample
> > rate and NOT the slow one.
>
>Neither EDF nor the binary format that I suggested can handle this --
>they both assume that the sampling rates are locked in a fixed ratio.
>Even if the jitter was okay, the drift would make problems, because at
>some point you'd have either one sample too many, or one sample too
>few to fill in the data chunk, if I understand what you are saying
>right.
>
>I think there has to be synchronisation between the sampling rates, or
>else we have to think of a completely different approach, and much
>more complex analysis and storage. (Actually, you say exactly this
>later on, so I think we are agreeing)

Yes, I am agreeing. I was trying to elaborate on a case I saw where some
interface programmers (without telling anyone) had devised a scheme to drop
the occasional sample to handle drift on two unsynchronized machines that
were operating at the same nominal rate. You are right that drift due to a
slight frequency error is much more significant than jitter. I was trying
to mention jitter in that if you did secretly drop samples in the data and
your source data rate wasn't much higher than your final decimated data
rate, you were going to end up with a lot of jitter that would really mess
up any frequency analysis. I am sorry that is just a bunch of hot air for
this group, but it has stuck in my craw since I wasn't able to explain the
problem to anyone.

> > 2. If a sample is bad, either the suggested method or how about this
> > suggestion? choose a number outside of the maximum or minimum
> > binary level to signify a bad sample. Of course that means that with
> > a 16 bit converter you would have to not use 1 or two of the 56636
> > codes. I was thinking of leaving two unused codes, one for a bad
> > sample and one for lead-off. Or would Lead-off be better handled by
> > the annotation stream?
>
>This causes problems if we're converting from existing stored data
>that uses the full 16-bit range. Did you see my suggestion that we
>store a separate error channel of 1 bit per sample to keep error
>flags ? This wouldn't be per-channel, though.

I haven't quite gotten my head around the implications of that. It
certainly would be space-efficient, and we could just label it as another
channel with a special label so that it wouldn't break current EDF
readers. Very few disadvantages to such a scheme.

>For floating point data, it is possible to store NaN (not-a-number) as
>a value, which would do what you suggest.
>
> > Is anything else reasonable? Does loss of synch mean that the
> > serial port couldn't keep up? I have been put under the impression
> > that modern PCs should be able to handle 115Kbaud. if the loss is a
> > very rare event, then the data could reasonable be considered
> > corrupt. If the loss of the data is frequent, don't we have a
> > reliability problem?
>
>I suppose, as you say, we could take any sync loss as a complete break
>in the signal, and do something like DDF, and start a new segment of
>the file, or even start a new file altogether. That would save us
>storing error information. It is an option.
>
>I think the important thing, though, is that we do *something* that
>will alert the user -- whether that is breaking the file at that
>point, or flagging an error in an error channel. The worst thing
>would be to just insert zeros and hide the error.

I agree. It is essential that the user and ANY analysis software know
about potentially flawed data within an otherwise useful data set. That is
already standard practise in some other fields - the analysis, depending on
the algorithm, will ignore the bad data.

In fact, this relates to something I see as a big shortcoming of EDF. I
don't think it is reasonable that the data not have some method of self
consistency check to mitigate the possibility of data corruption. A CRC is
an excellent example of this. In 1992 when EDF was created, such things
weren't considered as important but I think nowadays with standards for
data processing medical information, such things are highly
recommended. This could be another special channel like the "loss of
synch" or "data bad" flags or something like that. If it is indeed
implemented like this, there should be the option of using either CRC or
something computationally simple like bit wise XOR or something like that.

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



This archive was generated by hypermail 2.1.4 : 2002-07-27 12:28:40 BST