Published 2024-04-03.
Time to read: 2 minutes.
mainframe
collection.
There are many flavors of EBCDIC. I ran the following command on WSL/Ubuntu, and got the names of 187 flavors. IBM uses the term code page to distinguish between the flavors of EBCDIC. Some names look like they might be aliases; this suggests that perhaps there are only a few dozen flavors of EBCDIC.
$ iconv -l | grep '^IBM.*' | tr -d '//' | sort | tr '\n' ' ' | \ fold -sw 70 | clip.exe IBM-1008 IBM-1025 IBM-1046 IBM-1047 IBM-1097 IBM-1112 IBM-1122 IBM-1123 IBM-1124 IBM-1129 IBM-1130 IBM-1132 IBM-1133 IBM-1137 IBM-1140 IBM-1141 IBM-1142 IBM-1143 IBM-1144 IBM-1145 IBM-1146 IBM-1147 IBM-1148 IBM-1149 IBM-1153 IBM-1154 IBM-1155 IBM-1156 IBM-1157 IBM-1158 IBM-1160 IBM-1161 IBM-1162 IBM-1163 IBM-1164 IBM-1166 IBM-1167 IBM-12712 IBM-1364 IBM-1371 IBM-1388 IBM-1390 IBM-1399 IBM-16804 IBM-4517 IBM-4899 IBM-4909 IBM-4971 IBM-5347 IBM-803 IBM-856 IBM-901 IBM-902 IBM-9030 IBM-9066 IBM-921 IBM-922 IBM-930 IBM-932 IBM-933 IBM-935 IBM-937 IBM-939 IBM-943 IBM-9448 IBM037 IBM038 IBM1004 IBM1008 IBM1025 IBM1026 IBM1046 IBM1047 IBM1089 IBM1097 IBM1112 IBM1122 IBM1123 IBM1124 IBM1129 IBM1130 IBM1132 IBM1133 IBM1137 IBM1140 IBM1141 IBM1142 IBM1143 IBM1144 IBM1145 IBM1146 IBM1147 IBM1148 IBM1149 IBM1153 IBM1154 IBM1155 IBM1156 IBM1157 IBM1158 IBM1160 IBM1161 IBM1162 IBM1163 IBM1164 IBM1166 IBM1167 IBM12712 IBM1364 IBM1371 IBM1388 IBM1390 IBM1399 IBM16804 IBM256 IBM273 IBM274 IBM275 IBM277 IBM278 IBM280 IBM281 IBM284 IBM285 IBM290 IBM297 IBM367 IBM420 IBM423 IBM424 IBM437 IBM4517 IBM4899 IBM4909 IBM4971 IBM500 IBM5347 IBM775 IBM803 IBM813 IBM819 IBM848 IBM850 IBM851 IBM852 IBM855 IBM856 IBM857 IBM858 IBM860 IBM861 IBM862 IBM863 IBM864 IBM865 IBM866 IBM866NAV IBM868 IBM869 IBM870 IBM871 IBM874 IBM875 IBM880 IBM891 IBM901 IBM902 IBM903 IBM9030 IBM904 IBM905 IBM9066 IBM912 IBM915 IBM916 IBM918 IBM920 IBM921 IBM922 IBM930 IBM932 IBM933 IBM935 IBM937 IBM939 IBM943 IBM9448
I looked for an explanation of the code pages but could not find one. After a search, I discovered that their document entitled z/OS ICSF Writing PKCS #11 Applications said ‘The IBM1047 code page is assumed’ on page 1. Perhaps that is a clue that IBM1047 is the most common code page, or perhaps it is the most compatible EBCDIC variant.
Then I looked at the source code for
win_iconv.c
,
and found the following comments:
{37, "IBM037"}, /* IBM EBCDIC US-Canada */ {437, "IBM437"}, /* OEM United States */ {500, "IBM500"}, /* IBM EBCDIC International */ {708, "ASMO-708"}, /* Arabic (ASMO 708) */ /* 709 Arabic (ASMO-449+, BCON V4) */ /* 710 Arabic - Transparent Arabic */ {720, "DOS-720"}, /* Arabic (Transparent ASMO); Arabic (DOS) */ {737, "ibm737"}, /* OEM Greek (formerly 437G); Greek (DOS) */ {775, "ibm775"}, /* OEM Baltic; Baltic (DOS) */ {850, "ibm850"}, /* OEM Multilingual Latin 1; Western European (DOS) */ {852, "ibm852"}, /* OEM Latin 2; Central European (DOS) */ {855, "IBM855"}, /* OEM Cyrillic (primarily Russian) */ {857, "ibm857"}, /* OEM Turkish; Turkish (DOS) */ {858, "IBM00858"}, /* OEM Multilingual Latin 1 + Euro symbol */ {860, "IBM860"}, /* OEM Portuguese; Portuguese (DOS) */ {861, "ibm861"}, /* OEM Icelandic; Icelandic (DOS) */ {862, "DOS-862"}, /* OEM Hebrew; Hebrew (DOS) */ {863, "IBM863"}, /* OEM French Canadian; French Canadian (DOS) */ {864, "IBM864"}, /* OEM Arabic; Arabic (864) */ {865, "IBM865"}, /* OEM Nordic; Nordic (DOS) */ {870, "IBM870"}, /* IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2 */ {1026, "IBM1026"}, /* IBM EBCDIC Turkish (Latin 5) */ {1047, "IBM01047"}, /* IBM EBCDIC Latin 1/Open System */ {1140, "IBM01140"}, /* IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro) */ {1141, "IBM01141"}, /* IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro) */ {1142, "IBM01142"}, /* IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro) */ {1143, "IBM01143"}, /* IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro) */ {1144, "IBM01144"}, /* IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro) */ {1145, "IBM01145"}, /* IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro) */ {1146, "IBM01146"}, /* IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro) */ {1147, "IBM01147"}, /* IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro) */ {1148, "IBM01148"}, /* IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro) */ {1149, "IBM01149"}, /* IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro) */
Single Conversion
The following converts a file from EBCDIC flavor IBM-1047
to UTF-8.
$ FILE='CDROM/cicsadp/Cobol Application/TSO/source/cicsadp.mac'
$ iconv -f IBM-1047 -t UTF-8 "$FILE" > "$FILE".text
When I tried this conversion on $FILE
, the result contained a lot of garbage characters.
I decided to try coverting the file using the other encodings.
Testing 187 Code Pages
I wrote this script to attempt to find an ecoding for the file I had to work with:
#!/bin/bash function ctrl_c() { exit } trap ctrl_c INT FILE='CDROM/cicsadp/Cobol Application/TSO/source/cicsadp.mac' while read CODE_PAGE; do RESULT="$( iconv -f "$CODE_PAGE" -t UTF-8 "$FILE" )" printf "Code page: $CODE_PAGE To try the next code page, press the letter 'q'. To stop this script, press CTRL-C then press the letter 'q'. =================== $RESULT" | less done < <(iconv -l | grep '^IBM.*' | tr -d '//' | sort)
Each of the 187 conversions contained garbage. Not sure what to think.
Ruby: 17 Code Pages
The following displays all available EBCDIC encodings for Ruby 3.1.2p20:
irb(main):001> Encoding.name_list.select { |x|
x.start_with? 'IBM'
}.sort.join(", ")
Output is:
IBM037, IBM437, IBM720, IBM737, IBM775, IBM850, IBM852, IBM855, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864, IBM865, IBM866, IBM869
It does not look like Ruby supports the IBM1047 code page for EBCDIC. Again not sure what to think.