Re: [buildcheapeeg] EDF and DDF file formats

From: Jim Peters (jim_at_uazu.net)
Date: 2002-03-12 23:27:46


Dave wrote:
> But is there any reason to use the first kind of file (raw storage)
> for anything else other than simulation purposes?

If we can convert our raw data from the device into a storage format
*without any data loss*, then I can see no problem. This does mean
storing error information, though.

> From what I can tell of BioGraph, they grab the entire session's
> data and hold it in RAM. I like your method, Jim-P, where it looks
> like you grab a buffered "chunk" from the file in BWView and then
> only grab another chunk if you need it as you move forward and
> backwards through the data. However, if we felt that it was
> "afforadable" in terms of RAM resources, it would be easier to index
> a stored array (vector, or whatever) as well as faster than trying
> to seek and reposition into a data file, and, if the amount of data
> exceeds the current RAM resources of the machine, let the swapper
> take care of the rest.

It doesn't seem scalable to me to store the whole thing in memory. It
is just asking for trouble. Also, when I'm doing analysis in BWView,
I need large amounts of memory for the analysis buffers as well.

Your idea of using the swapper could work well on Linux, because you
could just mmap() the file into memory (assuming it was in a regular
format), but I'm not sure about Windows.

> But wouldn't these samples coming from multiple devices still be
> based on the same "chuncked interval" based on the computer's clock?
> In that sense, I was thinking that the data would be stored and
> recorded as multiple channel data, regardless if they came from one
> device or ten.

If you have one device sending out 256Hz samples, and another also
sending out 256Hz, there is almost guaranteed to be drift between
them. This means having extra samples in some chunks but not others,
and the whole thing quickly becomes a huge mess to analyse and store.

Really this chunked idea is only going to work when a single recording
device does all the sampling -- like the ProComp, sending out one
sample of EMG data for every 8 samples of EEG data, guaranteed.

Since there is no possibility for drift to be taken account of in the
EDF format, they must also be making this assumption.

> I have a question which comes from the work I have been doing with
> the ProComp. There is no way to know just how much data has been
> lost when I lose the sync byte. I might be able to make certain
> inferences if I assume that only one packet set was lost (they
> transmit a total of 144 bytes in each set, which presents 24 samples
> of EEG data from channels A&B and 3 samples of other biosensory data
> (such as GSR, HR, etc.). But if more than one set is lost, then I
> could not even compute an estimate of how much data might be missing
> from the time of the last sync. Thus, all I could do in the above
> scenario is throw an error bit the moment I realize that I am no
> longer reading valid data, wait for the sync and resume. Is that
> what you had in mind for situations such as this?

Yes, that is what I was thinking. Just do your best, and make sure
the user knows there was a problem (by setting the error bit at that
point).

> What would be our minimum resolution be (if any)? Would we ever
> have, say, < 1 second, and thus have several "chunks" per second?

Yes, definitely. For ProComp, you could use 1/32 sec, which gives 8
EEG samples and 1 of each of the other samples.

> Is that why you have chosen to represent this as a floating point
> value?

I used floating point because maybe there is some device out there
that basically samples at 256Hz, but sends out lower-rate info once
every 10 samples, which means 1/25.6 sec intervals, which isn't a nice
fraction.

> And what determines this duration when the file is being stored? Is
> that something we need to configure via software options?

I think you can make it as small as is sensible for your device. This
could be hard-coded -- e.g. 1/32 sec for the ProComp. For pure EEG
devices, we could send out one sample at a time, i.e. use 1/256 sec
intervals. I can't see a problem with that (although in EDF they seem
to prefer longer blocks).

> Also, could you speak to the issues surrounding the A/D clock and
> the role it plays in all of this? I am fuzzy on this issue, and
> came across several references as I was reading the EDF material,
> and you have a very good grasp of this aspect.

I'm not sure what you mean -- but as I mentioned above, this whole
idea is only going to work if all the sampling is being done off the
same clock, otherwise different signal sample rates will drift, and it
becomes hard to keep to a simple format. The format would have to be
quite different if we were really going to handle any number of
inputs, each with their own independent sampling clock.

You get the same problem working with audio -- it is a disaster (with
phasing effects and all sorts) if you try and record audio
simultanously using separate unsynchronised sound cards.

> Otherwise, I love the format being fleshed out. The only thing I
> would add is to situate it in a directory structure similar to the
> one used by BioGraph. It is a nicely organized way to group data.
> Perhaps something like this:
>
> <root data directory>
> <clientID>
> <yyyymmdd.nnn>
> session data files for one recording
>
> where 'yyyy' is the year, 'mm' the two digit month, 'dd' the two
> digit year, and 'nnn' is a sequence number based on the number of
> recorded sessions that day. This arrangement makes it sort nicely
> in directory displays based on date.

Sounds good to me.

Jim

-- 
Jim Peters (_)/=\~/_(_) jim_at_uazu.net
(_) /=\ ~/_ (_)
Uazú (_) /=\ ~/_ (_) http://
B'ham, UK (_) ____ /=\ ____ ~/_ ____ (_) uazu.net


This archive was generated by hypermail 2.1.4 : 2002-07-27 12:28:40 BST