Looking For The Right Way With Regular Expression With Groups In Different Order
Solution 1:
PyParsing (https://github.com/pyparsing/pyparsing) is a good module to easily build grammars. You can build a basic Copybook grammar and parse it using PyParsing. You would have to then post process to retain the tree-like structure that is represented by the two-digit level fields.
Also take a look at the Copybook package (https://github.com/zalmane/copybook) which uses PyParsing.
Solution 2:
cb2xml
You should look at cb2xml. It will parse a Cobol Copybook and create a Xml file. You can then process the Xml in python or any language. The cb2xml package has basic examples of processing the Xml in python + other languages.
Cobol:
01 Ams-Vendor.
03 Brand Pic x(3).
03 Location-details.
05 Location-Number Pic 9(4).
05 Location-Type Pic XX.
05 Location-Name Pic X(35).
03Address-Details.
05 actual-address.
10Address-1 Pic X(40).
10Address-2 Pic X(40).
10Address-3 Pic X(35).
05 Postcode Pic 9(4).
05 Empty pic x(6).
05 State Pic XXX.
03 Location-Active Pic X.
Output from cb2xml:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
<copybookfilename="cbl2xml_Test110.cbl"><itemdisplay-length="173"level="01"name="Ams-Vendor"position="1"storage-length="173"><itemdisplay-length="3"level="03"name="Brand"picture="x(3)"position="1"storage-length="3"/><itemdisplay-length="41"level="03"name="Location-details"position="4"storage-length="41"><itemdisplay-length="4"level="05"name="Location-Number"numeric="true"picture="9(4)"position="4"storage-length="4"/><itemdisplay-length="2"level="05"name="Location-Type"picture="XX"position="8"storage-length="2"/><itemdisplay-length="35"level="05"name="Location-Name"picture="X(35)"position="10"storage-length="35"/></item><itemdisplay-length="128"level="03"name="Address-Details"position="45"storage-length="128"><itemdisplay-length="115"level="05"name="actual-address"position="45"storage-length="115"><itemdisplay-length="40"level="10"name="Address-1"picture="X(40)"position="45"storage-length="40"/><itemdisplay-length="40"level="10"name="Address-2"picture="X(40)"position="85"storage-length="40"/><itemdisplay-length="35"level="10"name="Address-3"picture="X(35)"position="125"storage-length="35"/></item><itemdisplay-length="4"level="05"name="Postcode"numeric="true"picture="9(4)"position="160"storage-length="4"/><itemdisplay-length="6"level="05"name="Empty"picture="x(6)"position="164"storage-length="6"/><itemdisplay-length="3"level="05"name="State"picture="XXX"position="170"storage-length="3"/></item><itemdisplay-length="1"level="03"name="Location-Active"picture="X"position="173"storage-length="1"/></item></copybook>
An interesting application of cb2xml is described in Dynamically Reading COBOL Redefines with C#
CobolToCsv
The CobolToCsv package will convert a Cobol-Data-File to a Csv file. Limitations:
- Redefines / Multi-Record files are not handled
- Fairly limited range of Cobol Compilers support (Mainframe, Gnu Cobol, Fujitsu-Cobol).
Cobol2Csv should be able handle Text files (+ Comp-3). It may handle some of your files.
Solution 3:
Although an actual parser like PLY or parsely would be best for this if you have to use regex can't you just add another OCCURS group with a different key?. e.g.
"""
03 AMOUNT-BREAKDOWN PICTURE 9(8)V99 VALUE ZEROES.
03 AMOUNT-BREAKDOWN-X REDEFINES AMOUNT-BREAKDOWN.
05 FILLER PICTURE X(3) VALUE "DEC".
03 MONTH REDEFINES MONTH-TAB PICTURE X(3) OCCURS 12 TIMES.
03 SUB PICTURE 99 VALUE 0.
03 NUMBER-HOLD.
05 NUMB-HOLD PICTURE X OCCURS 11 TIMES.
05 FILLER PICTURE X(5) VALUE "TEN".
03 DIGIT-TAB2 REDEFINES DIGIT-TAB1.
05 DIGIT-TABLE OCCURS 10 PICTURE X(5).
03 WK-TEN-MILLION PICTURE X(5) VALUE SPACES.
"""import re
for line in __doc__.split("\n"):
iflen(line) < 1: continue
m = re.match(
"^(?P<level>\d{2})\s+(?P<name>\S+).*?""(\s+INDEXED BY\s+(?P<indexed_by>\S+))?.*?""(\s+REDEFINES\s+(?P<redefines>\S+))?.*?""(\s+OCCURS\s+(?P<occurs1>\d+).?( TIMES)?)?.*?"# <-- occurs1"(\s+PIC(TURE)?\s+(?P<pic>\S+))?.*?""(\s+OCCURS\s+(?P<occurs>\d+).?( TIMES)?)?.*?""((?P<comp>)\s+COMP\S+)?.*?""(\s+VALUE\s+(?P<value>\S+).*)?""\.$", line)
if m:
print m.groups()
Sample output:
('03', 'AMOUNT-BREAKDOWN', None, None, None, None, None, None, None, ' PICTURE 9(8)V99', 'TURE', '9(8)V99', None, None, None, None, None, ' VALUE ZEROES', 'ZEROES')
('03', 'AMOUNT-BREAKDOWN-X', None, None, ' REDEFINES AMOUNT-BREAKDOWN', 'AMOUNT-BREAKDOWN', None, None, None, None, None, None, None, None, None, None, None, None, None)
('05', 'FILLER', None, None, None, None, None, None, None, ' PICTURE X(3)', 'TURE', 'X(3)', None, None, None, None, None, ' VALUE "DEC"', '"DEC"')
('03', 'MONTH', None, None, ' REDEFINES MONTH-TAB', 'MONTH-TAB', None, None, None, ' PICTURE X(3)', 'TURE', 'X(3)', ' OCCURS 12 ', '12', None, None, None, None, None)
('03', 'SUB', None, None, None, None, None, None, None, ' PICTURE 99', 'TURE', '99', None, None, None, None, None, ' VALUE 0', '0')
('03', 'NUMBER-HOLD', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None)
('05', 'NUMB-HOLD', None, None, None, None, None, None, None, ' PICTURE X', 'TURE', 'X', ' OCCURS 11 ', '11', None, None, None, None, None)
('05', 'FILLER', None, None, None, None, None, None, None, ' PICTURE X(5)', 'TURE', 'X(5)', None, None, None, None, None, ' VALUE "TEN"', '"TEN"')
('03', 'DIGIT-TAB2', None, None, ' REDEFINES DIGIT-TAB1', 'DIGIT-TAB1', None, None, None, None, None, None, None, None, None, None, None, None, None)
('05', 'DIGIT-TABLE', None, None, None, None, ' OCCURS 10 ', '10', None, ' PICTURE X(5)', 'TURE', 'X(5)', None, None, None, None, None, None, None)
('03', 'WK-TEN-MILLION', None, None, None, None, None, None, None, ' PICTURE X(5)', 'TURE', 'X(5)', None, None, None, None, None, ' VALUE SPACES', 'SPACES')
Post a Comment for "Looking For The Right Way With Regular Expression With Groups In Different Order"