Python

Python 讀取 SAS7BDAT 格式 SAS 檔案教學與範例

介紹如何在 Python 中使用 sas7bdat 模組讀取 SAS7BDAT 格式的 SAS 檔案。

安裝 sas7bdat 模組

Python 的 sas7bdat 模組可以透過 pip 指令來安裝:

# 安裝 sas7bdat 模組
pip install sas7bdat

讀取 SAS7BDAT 格式 SAS 檔案

這裡我們以 airline.sas7bdat 這個檔案來示範如何使用 Python 的 sas7bdat 模組來讀取 SAS7BDAT 格式的 SAS 檔案。

首先以 sas7bdat 模組的 SAS7BDAT 函數開啟 SAS7BDAT 檔案:

# 引入 sas7bdat 模組的 SAS7BDAT 函數
from sas7bdat import SAS7BDAT

# 開啟 SAS7BDAT 格式的 SAS 檔案
reader = SAS7BDAT('airline.sas7bdat')

reader.header 中包含了詳細的標頭資訊:

# 標頭資訊
print(reader.header)
Header:
        col_count_p1: 6
        col_count_p2: 0
        column_count: 6
        compression: None
        creator: Written by SAS
        creator_proc: None
        date_created: 2008-05-13 15:25:11
        date_modified: 2008-05-13 15:25:11
        endianess: little
        file_type: DATA
        filename: airline.sas7bdat
        header_length: 1024
        lcp: 0
        lcs: 16
        mix_page_row_count: 32
        name: AIRLINE
        os_name:
        os_type:
        page_count: 1
        page_length: 4096
        platform: windows
        row_count: 32
        row_length: 44
        sas_release: 9.0000M0
        server_type: WIN
        u64: False

Contents of dataset "AIRLINE":
Num Name    Type   Length Format Label
--- ------- ------ ------ ------ ------------------
  1 YEAR    number      4        year
  2 Y       number      8        level of output
  3 W       number      8        wage rate
  4 R       number      8        interest rate
  5 L       number      8        labor input
  6 K       number      8        capital input

透過 reader.columns 可以取得 SAS 變數的屬性(variable attributes)

# 取得變數屬性(Variable Attributes)
for c in reader.columns:
    print("col_id:", c.col_id)
    print("  name:", c.name.decode(encoding = "utf-8"))
    print("  label:", c.label.decode(encoding = "utf-8"))
    print("  format:", c.format)
    print("  type:", c.type)
    print("  length:", c.length)
col_id: 0
  name: YEAR
  label: year
  format:
  type: number
  length: 4
col_id: 1
  name: Y
  label: level of output
  format:
  type: number
  length: 8
col_id: 2
  name: W
  label: wage rate
  format:
  type: number
  length: 8
col_id: 3
  name: R
  label: interest rate
  format:
  type: number
  length: 8
col_id: 4
  name: L
  label: labor input
  format:
  type: number
  length: 8
col_id: 5
  name: K
  label: capital input
  format:
  type: number
  length: 8

實際的資料內容則可透過 reader 取得:

# 讀取資料
for r in reader:
    print(r)
['YEAR', 'Y', 'W', 'R', 'L', 'K']
[1948.0, 1.2139999866485596, 0.24300000071525574, 0.1454000025987625, 1.4149999618530273, 0.6119999885559082]
[1949.0, 1.3539999723434448, 0.25999999046325684, 0.21809999644756317, 1.3839999437332153, 0.5590000152587891]
[1950.0, 1.569000005722046, 0.27799999713897705, 0.3156999945640564, 1.3880000114440918, 0.5730000138282776]
[1951.0, 1.9479999542236328, 0.296999990940094, 0.39399999380111694, 1.5499999523162842, 0.5640000104904175]
[1952.0, 2.265000104904175, 0.3100000023841858, 0.35589998960494995, 1.8020000457763672, 0.5740000009536743]
[1953.0, 2.7309999465942383, 0.32199999690055847, 0.35929998755455017, 1.9259999990463257, 0.7110000252723694]
[1954.0, 3.0250000953674316, 0.33500000834465027, 0.4025000035762787, 1.9639999866485596, 0.7760000228881836]
[1955.0, 3.562000036239624, 0.3499999940395355, 0.3961000144481659, 2.115999937057495, 0.8270000219345093]
[1956.0, 3.9790000915527344, 0.3610000014305115, 0.3822000026702881, 2.434999942779541, 0.800000011920929]
[1957.0, 4.420000076293945, 0.3790000081062317, 0.304500013589859, 2.7070000171661377, 0.9210000038146973]
[1958.0, 4.563000202178955, 0.39100000262260437, 0.32839998602867126, 2.7060000896453857, 1.0670000314712524]
[1959.0, 5.385000228881836, 0.4259999990463257, 0.3856000006198883, 2.8459999561309814, 1.0829999446868896]
[1960.0, 5.553999900817871, 0.4410000145435333, 0.31929999589920044, 3.0889999866485596, 1.4809999465942383]
[1961.0, 5.465000152587891, 0.46000000834465027, 0.30790001153945923, 3.121999979019165, 1.7359999418258667]
[1962.0, 5.824999809265137, 0.48500001430511475, 0.3783000111579895, 3.184000015258789, 1.9259999990463257]
[1963.0, 6.875999927520752, 0.5059999823570251, 0.4180000126361847, 3.263000011444092, 2.0409998893737793]
[1964.0, 7.822999954223633, 0.5379999876022339, 0.5163000226020813, 3.4119999408721924, 1.996999979019165]
[1965.0, 9.119999885559082, 0.5640000104904175, 0.5878999829292297, 3.622999906539917, 2.256999969482422]
[1966.0, 10.51200008392334, 0.5860000252723694, 0.536899983882904, 4.073999881744385, 2.742000102996826]
[1967.0, 13.020000457763672, 0.621999979019165, 0.44429999589920044, 4.710000038146973, 3.563999891281128]
[1968.0, 15.26099967956543, 0.6660000085830688, 0.3052000105381012, 5.2170000076293945, 4.767000198364258]
[1969.0, 16.312999725341797, 0.7310000061988831, 0.23319999873638153, 5.568999767303467, 6.511000156402588]
[1970.0, 16.00200080871582, 0.8309999704360962, 0.1882999986410141, 5.494999885559082, 7.626999855041504]
[1971.0, 15.87600040435791, 0.906000018119812, 0.20229999721050262, 5.334000110626221, 8.67300033569336]
[1972.0, 16.66200065612793, 1.0, 0.25060001015663147, 5.34499979019165, 8.331000328063965]
[1973.0, 17.013999938964844, 1.055999994277954, 0.2667999863624573, 5.6620001792907715, 8.557000160217285]
[1974.0, 19.30500030517578, 1.13100004196167, 0.266400009393692, 5.729000091552734, 9.508000373840332]
[1975.0, 18.72100067138672, 1.246999979019165, 0.23010000586509705, 5.7220001220703125, 9.062000274658203]
[1976.0, 19.25, 1.375, 0.3452000021934509, 5.76200008392334, 8.26200008392334]
[1977.0, 20.64699935913086, 1.5440000295639038, 0.45080000162124634, 5.876999855041504, 7.473999977111816]
[1978.0, 22.72599983215332, 1.7029999494552612, 0.5877000093460083, 6.107999801635742, 7.104000091552734]
[1979.0, 23.618999481201172, 1.7790000438690186, 0.534600019454956, 6.8520002365112305, 6.874000072479248]

若習慣以 Pandas 來處理資料的話,亦可直接將 SAS7BDAT 檔案的內容轉為 Pandas 的 DataFrame 資料格式:

# 將資料直接轉換為 Pandas DataFrame
df = reader.to_data_frame()
df
      YEAR          Y      W       R      L      K
0   1948.0   1.214000  0.243  0.1454  1.415  0.612
1   1949.0   1.354000  0.260  0.2181  1.384  0.559
2   1950.0   1.569000  0.278  0.3157  1.388  0.573
3   1951.0   1.948000  0.297  0.3940  1.550  0.564
4   1952.0   2.265000  0.310  0.3559  1.802  0.574
5   1953.0   2.731000  0.322  0.3593  1.926  0.711
6   1954.0   3.025000  0.335  0.4025  1.964  0.776
7   1955.0   3.562000  0.350  0.3961  2.116  0.827
8   1956.0   3.979000  0.361  0.3822  2.435  0.800
9   1957.0   4.420000  0.379  0.3045  2.707  0.921
10  1958.0   4.563000  0.391  0.3284  2.706  1.067
11  1959.0   5.385000  0.426  0.3856  2.846  1.083
12  1960.0   5.554000  0.441  0.3193  3.089  1.481
13  1961.0   5.465000  0.460  0.3079  3.122  1.736
14  1962.0   5.825000  0.485  0.3783  3.184  1.926
15  1963.0   6.876000  0.506  0.4180  3.263  2.041
16  1964.0   7.823000  0.538  0.5163  3.412  1.997
17  1965.0   9.120000  0.564  0.5879  3.623  2.257
18  1966.0  10.512000  0.586  0.5369  4.074  2.742
19  1967.0  13.020000  0.622  0.4443  4.710  3.564
20  1968.0  15.261000  0.666  0.3052  5.217  4.767
21  1969.0  16.313000  0.731  0.2332  5.569  6.511
22  1970.0  16.002001  0.831  0.1883  5.495  7.627
23  1971.0  15.876000  0.906  0.2023  5.334  8.673
24  1972.0  16.662001  1.000  0.2506  5.345  8.331
25  1973.0  17.014000  1.056  0.2668  5.662  8.557
26  1974.0  19.305000  1.131  0.2664  5.729  9.508
27  1975.0  18.721001  1.247  0.2301  5.722  9.062
28  1976.0  19.250000  1.375  0.3452  5.762  8.262
29  1977.0  20.646999  1.544  0.4508  5.877  7.474
30  1978.0  22.726000  1.703  0.5877  6.108  7.104
31  1979.0  23.618999  1.779  0.5346  6.852  6.874

若要將資料匯出為逗點分隔(CSV)檔案,可以使用 convert_file 函數,而其 delimiter 可以指定欄位分隔字元(預設為逗號):

# 將資料匯出為 CSV 檔案
reader.convert_file("output.csv", delimiter=',')

檔案使用完畢後,呼叫 close 關閉檔案:

# 關閉檔案
reader.close()

sas7bdat_to_csv 指令稿工具

在安裝 Python 的 sas7bdat 模組時,會附帶安裝一個 sas7bdat_to_csv 指令稿工具,它可以用來將 SAS7BDAT 格式的 SAS 檔案轉換為逗點分隔(CSV)檔案:

# 將 SAS7BDAT 檔案轉換為 CSV 檔案
sas7bdat_to_csv airline.sas7bdat
[airline.sas7bdat] ⟶ [airline.csv] wrote 32 of 32 lines

Successfully converted 1 of 1 file

新產生的 CSV 檔案會儲存在 airline.csv

Share
Published by
Office Guide

Recent Posts

Python 使用 PyAutoGUI 自動操作滑鼠與鍵盤

本篇介紹如何在 Python ...

1 年 ago

Ubuntu Linux 以 WireGuard 架設 VPN 伺服器教學與範例

本篇介紹如何在 Ubuntu ...

1 年 ago

Linux 網路設定 ip 指令用法教學與範例

本篇介紹如何在 Linux 系...

1 年 ago

Linux 以 Cryptsetup、LUKS 加密 USB 隨身碟教學與範例

介紹如何在 Linux 系統中...

1 年 ago