Understanding a DICOM header

 

 

  1. The DICOM image file consists of three parts.
    1. Preamble.
    2. Header
    3. Image (or more technically – pixels).
  2. Preamble actually has nothing in it. There are no rules declared or constitutions written. It is purely 128 empty bytes. ie. Every DICOM file consists of 128 null bytes in the beginning. After these 128 bytes 4 characters are suffixed. These are the ‘D’,’I’,’C’,’M’ characters. So every DICOM file has 128 null bytes followed by DICM in the beginning of the file. This is the preamble of it.
  3. DICOM header is a very complex header, different from the rest of the image types. It is wholly made of TAGs.
  4. The tags are laid one after another without any spacing or differential character.
  5. Tags are unique. That is no 2 tags have same numbers or identifiers.
  6. According to DICOM standards the tags are innumerable. So the tags are divided into families. Some families are – 0002, 0008, 0028,7FE0.
  7. These families again have their respective family members. For example 0002,0010 is a tag. Where 0002 is the family name or group number and 0010 is the member name or the element number. It continue to (0002,0011), (0002,0012) etc.
  8. Tag Structure consists of four segments. These segments start with the tag name followed by the value representation followed by value length followed by the value.
  9. Group Number + Element Number  = Tag Name.

A tag name consists of 2 parts. The group number followed by the element number occupying 2 byte each. i.e the tag name is totally occupying 4 bytes each.

           

  1. These Tag numbers are in Hexadecimal format. When you are reading them convert them from binary to hexadecimal format. In java it can be done using Integer.toHexString( (int) byte);
  2. Value Representation(VR) is followed by the Tag name. It consists of 2 or 3 lettered characters describing the tag. It occupies 2 bytes. Some examples of VR are ‘OW’,’OB’ etc.
  3. This is encoded as characters. Just read the binary and make a char string from it.
  4. Value Length (VL)  is followed by the VR. It contains the information of the size of Value field which follows VL. The value is stored in Unsigned Int format. It occupies 2 bytes.
  5. Even this is in Hexadecimal format. I’ve converted this to decimal format to read the value. Reading the value can also be done without converting.
  6. Value field follows VL. This is the last segment of a Tag. It consists of value which is needed. The size it occupies is obtained from VL.
  7. The format in which the Value is to be read is generally Character string. Except for some values which might contain numerical values. If you are doing it for the first time just record the value as a byte and go to the next tag.
  8. So when we are reading a DICOM header we have to read points 8 thru 11 continuously until the last tag.
  9. The first tag is always starting with the Group 0002 family. In every file there are a certain number of group 0002 family tags to describe what type is the file. i.e whether it is Little Endian type or Big Endian type, Implicit VR type or Explicit VR type, and also if it is a CTN file. (These will be explained later).
  10. The last tag is the ( 7FE0 , 0010 ) Tag. It consists of Pixels. This is the ultimate tag. To find this tag we have to read all the previous tags in the image file.
  11. One Exception  in this case is if the VR is equal to ‘OB’,’OW’,’OF’,’SQ’,’UI’ or ’UN’, the VR is having an extra 2 bytes trailing to it. These 2 bytes trailing to VR are empty and are not decoded.
  12. When VR is having these 2 extra empty bytes the VL will occupy 4 bytes rather than 2 bytes.
  13. The tag 0002,0010 has the Transfer Syntax which determines whether the file is Explicit, Implicit, Little endian, or Big endian.
  14. If the Transfer Syntax is 1.2.840.10008.1.2.1 the file is Little Endian. i.e every next bit we read is prefixed to the array (into which we are reading).
  15. If the Transfer Syntax is 1.2.840.10008.1.2.2 the file is Big Endian. i.e every next bit read is suffixed to the array(into which we are reading).
  16. As talked about in point 14, there are different kinds of DICOM files. Main 2 are Explicit VR and Implicit VR.  Simply speaking the Explicit VR type has a VR where the Implicit type doesn’t have a VR.
  17. Points 8 to 17 describe the Explicit VR type.
  18. Points 1 to 7 are common even for Implicit VR type.
  19. Tag structure  of impicitVR type starts with a 4byte tag no. followed by 4 byte VL followed by Value who’s length is defined in VL.
  20. Even in Implicit VR type files, the 0002 group tags or technically – the Metadata, is Explicit. i.e an implicit VR type file starts with Explicit VR only and the metadata decides if the file is Implicit. And if implicit the implicit reading should start after the end of Metadata.
  21. So recording the size of metadata is important. This info is in the first tag – 0002,0000. Caution – the length of Metadata is the length after the value field of the tag 0002,0000 till the end of metadata.
  22. If the Transfer syntax is 1.2.840.10008.1.2 it is of Implicit VR Type with Little Endian. Implicit VR is always only Little Endian. No Big Endian in Implicit VR.
  23. There is absolutely no need of a Data Dictionary (as said by Lead Tools) in the first instance even for implicit VR. No trailing empty bits (as in explicit VR) apply to Implicit VR.
  24. Another main part of the DICOM files is the Squences.
  25. The tags which have a VR of SQ are identified as Sequence tags (meaning doesn’t apply to Implicit VR type files).
  26. A Sequence Tag has more tags encapsulated in its Value field.
  27. A Sequence Tag can again have a Sequence Tag in its encapsulated tags.
  28. So nesting is widely possible.
  29. The sequence tag consists of Items in it.
  30. Each Item starts with FFFE,E000.
  31. The Item structure starts with 2bytes of item no. 4 bytes of item length, and item value (value consists of normal tags as defined in point 8).
  32. If the Item Length is having a value of “-1” then the value field is known to be having an undefined value length.
  33. In an undefined Value length, the End of Item is defined by the tag FFFE,E00D.
  34. The End of Sequence is defined by FFFE,E0DD.
  35. Both the fffe,e00d and fffe,e0dd have a 2 bytes tag no. and 4 bytes Value length. But do not have a value field.
  36. Another possible Dicom File is the CTN file. This does not have the preamble. There are no empty bytes in the beginning of the file. There is no DICM either.
  37. Surprisingly this file doesn’t start with a Metadata tags. i.e no 0002 group tags at all.
  38. It starts with 0008 group tags.
  39. It is Implicit VR with Little Endian.
  40. The rest of it is same as Impicit VR.
  41. Multiframe images: A dicom file can contain number of multiple frames. The number of frames is defined in the tag 0028,0008.
  42. Tips in reading pixles:
    1. Suppose pixles[] is the array into which you will read the pixles, its length should be equal to (rows x columns).
    2. For multiframe images after every end of (rows x columns) size the next image starts . This continues until ((rows x columns) x no.of frames).
    3. Another tip is if you are given undefined length for length of pixels tag, just (rows x columns) for 8 – bit image, (rows x columns) x 2 for 16 bit image and so on.