WSI to DICOM Converter 數位病理全玻片影像 SVS 轉 DICOM 格式教學與範例

介紹如何使用 WSI to DICOM Converter 指令工具，將 SVS 格式的數位病理全玻片影像（WSI）轉換為標準的 DICOM 影像格式。

下載與編譯 WSI to DICOM Converter

從 WSI to DICOM Converter 的 GitHub 網站上下載原始碼：

# 下載 WSI to DICOM Converter 原始碼
git clone https://github.com/GoogleCloudPlatform/wsi-to-dicom-converter.git

進入專案目錄：

# 進入專案目錄
cd wsi-to-dicom-converter/

若在 Ubuntu Linux 環境之下，可以使用預先寫好的編譯指令稿自動編譯：

# 自動編譯（Ubuntu Linux）
sudo ./cloud_build/ubuntuBuild.sh

編譯完成之後，會產生一個 wsi2dcm 執行檔，接著就可以使用此工具進行 WSI 影像的轉換了。

WSI 影像 SVS 轉 DICOM 格式

在 WSI to DICOM Converter 原始碼的 tests 目錄之下有一個 CMU-1-Small-Region.svs 測試用的 WSI 影像，我們可以使用這個 SVS 檔案來示範如何將 WSI 影像轉為 DICOM 格式。首先建立一個存放輸出 DICOM 影像的目錄：

# 建立輸出用目錄
mkdir output

接著使用 wsi2dcm 將 SVS 格式的 WSI 影像轉為 DICOM 格式，其 --input 可指定輸入的 SVS 檔案，--outFolder 可指定輸出的目錄，而 --seriesDescription 則可用來指定 DICOM 的 (0008,103E) Series Description 標籤內容：

# 將 SVS 格式的 WSI 影像轉為 DICOM 格式
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region

[2021-10-26 18:02:09.913615] [0x00007f59f5fdca40] [info]    dicomization is started
[2021-10-26 18:02:09.913726] [0x00007f59f5fdca40] [warning] threads parameter is less than 1, consuming all avalible threads
[2021-10-26 18:02:09.913787] [0x00007f59f5fdca40] [warning] batch parameter is not set, batch is unlimited
[2021-10-26 18:02:10.280053] [0x00007f59f5fdca40] [info]    dicomization is done

轉換完成之後，output 目錄中就會產生一個 DICOM 個式的檔案 level-0-frames-0-30.dcm。

Tile 大小

若要調整輸出 DICOM 影像中每片 tile 的尺寸，可以使用 --tileHeight 與 --tileWidth 參數分別指定 tile 的高度與寬度：

# 指定 Tile 尺寸為 400x400
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --tileHeight 400 \
  --tileWidth 400

解析度層級

若要自行指定解析度層級數量，可以使用 --levels 參數：

# 指定解析度層級數為 2
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --levels 2

另外 --downsamples 參數可以指定各層級的降解析度的因子：

# 指定各層級降解析度因子為 1 2 4
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --downsamples 1 2 4

--startOn 與 --stopOn 參數可以用來指定起始與結束的層級。

分批儲存

wsi2dcm 的 --batch 參數可以指定每個 DICOM 檔案儲存的影像張數上限值，可用於高解析度的病理影像，將大量的影像分開在多個檔案儲存：

# 指定單一檔案儲存影像張數上限值為 10
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --batch 10

空間位置

WSI 影像轉為 DICOM 時，預設會使用 TILED_FULL 方式儲存，若要改用 TILED_SPARSE 方式儲存，可以加上 --sparse 參數，詳細說明請參考 DICOM PS3.3 的文件。

# 以 TILED_SPARSE 方式儲存
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --sparse

影像壓縮

wsi2dcm 預設會使用 jpeg 的方式壓縮影像，若要更改壓縮方式，可以使用 --compression 參數指定，可用的選項有 jpeg、jpeg2000、raw：

# 不壓縮影像
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --compression raw

而若是有採用影像壓縮，可以使用 --compressionQuality 參數來指定影像壓縮品質，可用的值為 0 到 100：

# 指定影像壓縮品質為 90
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --compressionQuality 90

DICOM 標籤

在使用 wsi2dcm 轉換檔案時，至少要以 --seriesDescription 參數指定 (0008,103E) Series Description 標籤內容，除此之外亦可使用 --studyId 與 --seriesId 來指定 (0020,000D) Study Instance UID 與 (0020,000E) Series Instance UID。

# 指定 Study Instance UID 與 Series Instance UID
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --studyId 1.3.6.1.4.1.14519.5.2.1.7777.9002.29231581743327260 \
  --seriesId 1.3.6.1.4.1.14519.5.2.1.7777.9002.53195580009743992

若要指定其他更多的 DICOM 標籤內容，可以使用 JSON 檔案來指定大量的 DICOM 標籤內容，以下是 JSON 格式的 DICOM 標籤內容範例，包含 (0010,0010) Patient’s Name、(0010,0020) Patient ID 與 (0010,0040) Patient’s Sex 三筆標籤資料：

{
    "00100010": {
        "vr": "PN",
        "Value": [
            "PD-1-MELANOMA-00016"
        ]
    },
    "00100020": {
        "vr": "LO",
        "Value": [
            "0326baf9-4e1b-47e3-9290-fd3d90e79ee0"
        ]
    },
    "00100040": {
        "vr": "CS",
        "Value": [
            "M"
        ]
    }
}

關於 JSON 的 DICOM 標籤格式可以參考 DICOM JSON Format。

準備好 JSON 格式的 DICOM 標籤內容檔案之後，即可使用 --jsonFile 參數透過 JSON 檔案指定 DICOM 標籤內容：

# 透過 JSON 檔案設定 DICOM 標籤內容
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --jsonFile my_tags.json

多執行緒

wsi2dcm 的 --threads 參數可以指定程式使用的執行緒數量：

# 指定執行緒數量為 2
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --threads 2

內插演算法

wsi2dcm 再降解析度時預設使用的演算法是 nearest neighbor 內差法，若希望改用 bilinear 內差法，可以加上 --bilinearDownsampling 參數。

# 以 bilinear 內差法降解析度
wsi2dcm \
  --input=CMU-1-Small-Region.svs \
  --outFolder=output \
  --seriesDescription=CMU-1-Small-Region \
  --bilinearDownsampling

常見問題

若編譯時出現 Could NOT find GTest 的問題，通常是因為 libgtest-dev 僅提供未編譯的原始碼，可參考 StackOverflow 的的建議自行編譯 GTest。

# 安裝 GTest 與 CMake
sudo apt-get install libgtest-dev cmake

# 自行編譯 GTest
cd /usr/src/gtest
sudo cmake CMakeLists.txt
sudo make

DICOM 標準問題

以 wsi2dcm 轉換出來的 DICOM 影像通常會因為使用者沒有提供 DICOM 標準中所要求的必要標籤（tags）資料，導致產生出來的 DICOM 影像檔案有瑕疵，若想要確保 DICOM 影像檔案的可流通性，建議在 DICOM 影像轉出之後，以 dciodvfy 與 dcentvfy 這兩項 DICOM 影像標準驗證工具來進行驗證，並修正錯誤的項目，通常就是在轉換檔案時以 JSON 檔案補足該有的 DICOM 標籤資訊即可。