|
|||||||
Client Server Corner - SQL Server Data Loading Part I - Bill WolffSQL Server offers excellent performance benefits for very large files. Arev tasks that normally take hours are reduced to minutes. Many corporations have pilot projects under way that exercise the client server relationship. The Arev SQL Server bond is uniquely positioned to handle that challenge. Most downsizing projects involve large data sets from a mainframe or mini environment. The data is downloaded and converted to ASCII for input to the server. Another popular approach is upsizing swollen Arev applications that are slowing under the weight of excessive data entry. In either case, client programs are typically written to port the data to SQL Server. We will explore four of these techniques and compare the efficiency of each. All four methods rely on R/BASIC programming. The first uses bond technology to map read and write statements to SQL inserts. The second and third build the SQL insert statements in code and pass them to the server with a subroutine call. EXEC_TRANS_SQL and MSSQL_C are the routines supplied with the bond package that perform this task. The last method builds an ASCII file with OSBWRITE that is compatible with the SQL Server bulk copy utility. A fair test might include the Arev IMPORT.EXPORT utility as a fifth method, but that alternative was considered too slow. SQL Server BasicsA few points on SQL Server architecture are worth mentioning before we code. Accounts are called databases, data files are called tables, fields are called columns, and records are called rows. This nomenclature is generic to Structured Query Language and is central to the Arev and Open Engine SQL implementations. Statements are passed to the server for data definition (DDL) and data manipulation (DML). In all cases, statements are text strings with proper syntax. INSERT is the verb for creating new rows. In Arev, a row might be added to VERBS as follows: INSERT INTO VERBS VALUES ('LISTDICT', 'RBASIC', 'VERBS', NULL, 'V45') Note that key fields in SQL are one or more unique columns. SQL Server defines unique fields with an index statement. LISTDICT would normally be the key in Arev programming but appears here as the first column in the list. The other columns follow in order. The keyword NULL is used to imply value unknown instead of the Arev convention of "". For comparison, the same result would be generated with the following R/BASIC statements: 0001 OPEN 'VERBS' TO F_VERBS THEN 0002 WRITE 'RBASIC': @FM:'VERBS':@FM:@FM: 'V45' ON F_VERBS, 'LISTDICT' 0003 END Once the syntactical conventions are understood, data loading can begin. At some point, Arev or any other front end must connect to the server and speak with this syntax. The Arev bond provides the translation and connectivity functions necessary for this to work. The bond has several layers including a base filing system (BFS) and supporting subroutines. By addressing the bond at the appropriate layer, performance and ease of use can be balanced as needed. Bonding PrinciplesArev bonds to a SQL Server table by attaching it with the proper volume pointer. SQL Server volume pointers are defined in the SETVOLUME window. A name is defined and stored as a key in VOC. The record has the following structure: <1> VOLUME <2> MSSQL_BFS <3> server name: @VM: database name: @VM: user login: @VM: password <4> control file location The parameters in field three are accessed in the SETVOLUME window with the F2 option key. They refer to the server name on the LAN, the database, and the login/password combination defined on SQL Server. This combination connects to the server and provides specific rights. These rights are controlled by the server administrator and may prevent you from accessing desired tables and procedures. Once this volume pointer is defined, LISTMEDIA and ATTACH work the same way they do with linear hash files. Attaching a file will create a phantom Arev dictionary in the directory listed in the control file location mentioned above. This dictionary has field definitions in standard Arev format. Key fields are determined by querying SQL Server for a unique clustered index on the table. Symbolic and group fields can be added as needed and will work as advertised. If the table structure is changed on the server, the field definitions will need to be regenerated. The bonded table and its dictionary are also listed in the FILES file. The file handle of the SQL table has an interesting structure. You can find it in the FILES record field 5. The names and datatypes of the columns are stored there for use in SQL statement generation. A connection is also established for SQL Server. This is similar to an open file and uses the network named pipe feature. This connection has a process number and data segment which are stored in a labelled common variable. The latter has data required for the supporting C routines and takes about 10K of string space. Note that each workstation can have multiple connections to SQL Server which is common in multi-user environments like Windows and OS/2. With all of these pointers established, SQL tables can be listed, edited, and programmed like any other Arev file. The MSSQL_BFS portion of the bond translates each open, read, write, delete, lock, etc. into appropriate SQL statements and transmits them over the process connection. The query results are stored in a buffer and converted to @id and @record one row at a time. Arev routines never see the SQL statements and consequently work with the data in a normal fashion. Bonded FilesApplying this discussion to data loading means writing an R/BASIC program that reads a linear hash or ASCII source file and writes records to a bonded SQL table. The source code follows: 0001 /* 0002 Wolff Data Systems 0003 Arev copy to SQL example 0004 Bonded SQL Server file with R/BASIC WRITE record to table 0005 minutes 0006 */ 0007 open 'MASTER' to f_master then 0008 open 'MASTER_SQL' to f_master_sql then 0009 select f_master 0010 done = '' ; count = 0 ; error = '' 0011 call dostime( start) 0012 loop until done 0013 count += 1 0014 print @( 0, 21): count: 0015 readnext k_master then 0016 read r_master from f_master, k_master then 0017 r_master_sql = field( r_master, @fm, 1, 7) 0018 r_master_sql< 8> = r_master< 8, 1> 0019 r_master_sql< 9> = r_master< 8, 2> 0020 r_master_sql< 10> = r_master< 8, 3> 0021 r_master_sql< 11> = field( r_master, @fm, 9, 5) 0022 r_master_sql< 16> = field( r_master, @fm, 15, 2) 0023 write r_master_sql on f_master_sql, k_master else 0024 error< -1> = k_master 0025 end 0026 end 0027 end else done = 1 0028 repeat 0029 call dostime( finish) 0030 call msg( finish - start, '', '', '') 0031 oswrite error on 'error.sql' 0032 end 0033 end In this example, MASTER is a linear hash file on a network drive. MASTER_SQL is a SQL Server table attached with the bond. Each record in the source file is read, normalised, and counted. The elapsed time is recorded and displayed with a call to message. Lines 20 through 22 convert a multi-valued address field to three SQL columns. This method is easy to program for an Arev veteran but generates excessive overhead for the bond. Each write forces the bond to call a subroutine that builds a SQL insert statement. The dictionary must be read to handle date and numeric conversions. This query is transmitted over the volume connection and tested for errors. @FILE.ERROR is set if there is a problem. This routine converted 8,500 records in 64 minutes on a 386/33 system with Ethernet adapters. EXEC_TRANS_SQL Bypasses the BondSince each source record is parsed and converted to a new string for writing to the target file, we can skip a step by building the INSERT statement at the same time. The bond routine will not be called so overhead is reduced. The EXEC_TRANS_SQL subroutine provides this facility. It accepts a SQL statement and transmits it to SQL Server using a volume connection for a previously attached SQL table. The code is similar with some notable exceptions: 0001 /* 0002 Wolff Data Systems 0003 Arev copy with exec_trans example 0004 Exec_Trans_SQL call to insert each record - 27 minutes 0005 */ 0006 declare subroutine exec_trans_sql 0007 open 'MASTER' to f_master then 0008 * read records 0009 select f_master 0010 done = '' 0011 count = 0 0012 p_dos = 0 0013 error = '' 0014 exec_trans_sql( 'TEST_SQL', 'use test', results) 0015 call dostime( start) 0016 loop until done 0017 count += 1 0018 readnext k_master then 0019 read r_master from f_master, k_master then 0020 0021 r_master_sql='(':k_master:',"':field(r_master,@fm,1,7) 0022 r_master_sql< 8> = r_master< 8, 1> 0023 r_master_sql< 9> = r_master< 8, 2> 0024 r_master_sql< 10> = r_master< 8, 3> 0025 r_master_sql< 11> = field( r_master, @fm, 9, 3) 0026 r_master_sql< 14> = oconv( r_master< 12>, 'D4-') 0027 r_master_sql< 15> = oconv( r_master< 13>, 'D4-') 0028 r_master_sql< 16> = r_master< 15> 0029 r_master_sql< 17> = r_master< 16>: '")' 0030 swap @fm with '","' in r_master_sql 0031 r_master_sql[ 1, 0] = 'insert into master_sql values ' 0032 exec_trans_sql( 'TEST_SQL', r_master_sql, results) 0033 print @( 0, 21): count 'R#5': 0034 if @file.error then error< -1> = @file.error 0035 end 0036 end else done = 1 0037 repeat 0038 call dostime( finish) 0039 call msg( finish - start, '', '', '') 0040 oswrite error on 'error.sql' 0041 end Line 16 is a simple call to SQL Server that ensures the current database is correct. This is similar to an Arev logto. Line 22 incorporates the key into the data values. Lines 27 and 28 supply the necessary date conversion directly without referencing the data dictionary. Line 32 adds the SQL verb and table name to the beginning of the string. Line 33 transmits the completed statement to the server. RESULTS is a variable used for query results. INSERT statements do not generate results so this variable is not tested, but @FILE.ERROR is. Generating the SQL statement directly in the conversion routine cuts processing time in half. Applying the date conversion right in the code loop also saves time. The test took 27 minutes. (Volume 4, Issue 1, Pages 6-8) |
|||||||
| |||||||