True BASIC Program to Calculate

General Multiple Linear Regression

by Namir Shammas

The following program calculates the statistical coefficients for the following model:

G(Y) = C0 + C1 F1(X1)  + C2 F2(X2) + ...  + Cn Fn(Xn)

Where X1,  X2, ..., and Xn are the independent variables and Y is the dependent variable. In addition, G(), F1(),  F2() and so on are optional transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.

The program displays the following menu:

                   

            MULTIPLE LINEAR REGRESSION
            ==========================
0) QUIT
1) KEYBOARD INPUT
2) DATA STATEMENT INPUT
3) FILE INPUT
4) FILE OUTPUT
5) CALCULATE REGRESSION
SELECT CHOICE BY NUMBER: 

Option 1 allows you to enter data from the keyboard. This option performs the following tasks:

1. Prompts you for the number of independent variables.

2. Prompts you for the number of observations.

3. Prompts you for the values in the matrix of  independent variables. REMEMBER THAT THE FIRST COLUMN OF THIS MATRIX IS ONES.

4. Prompts you for the values of the dependent variable Y.

Option 2 permits you to obtain data from the program's own DATA statements. The DATA statements must provide values for:

1. The the number of independent variables.

2. The number of observations.

3. The values in the matrix of  independent variables. REMEMBER THAT THE FIRST COLUMN OF THIS MATRIX IS ONES.

4. The values of the dependent variable Y.

Option 3 allows you to obtain data from a text file. The program prompts you for the input filename. This text file contains values for the following (each value must appear on a separate text line):

1. The the number of independent variables.

2. The number of observations.

3. The values in the matrix of  independent variables. REMEMBER THAT THE FIRST COLUMN OF THIS MATRIX IS ONES.

4. The values of the dependent variable Y.

Once the program reads the data, it asks you if you want to transform the data (using the code in subroutine TRNSF). Enter Y or Yes if you want to process with the data transformation. Otherwise, enter N or No to bypass the transformation step.

Option 4 allows you to store the current data to a text file. The program prompts you for the output filename.

Option 5 triggers the multiple regression calculations which perform the following tasks:

1. Calculates and displays the regression coefficients C(0), C(1), and so on..

2. Calculates and displays the coefficient of determination R-Square.

The DATA statements contain the following the data shown in the next table (note that X0 is a dummy variable that represents the column of 1's):

X0 X1 X2 X3 Y
1 7 25 6 60
1 1 29 15 52
1 11 56 8 20
1 11 31 8 47
1 7 52 6 33

The above data yield the following results:

C( 0 ) =103.447316589

C( 1 ) =-1.28409650404

C( 2 ) =-1.03692762188

C( 3 ) =-1.33948793673

R2 = 0.998937219108

Here is the BASIC listing:

OPTION TYPO
OPTION NOLET
! MULTIPLE LINEAR REGRESSION
DECLARE NUMERIC I, J, NDATA, NVARS, NVARSP1, C, R2, SUMY, SUMCX, SUMY2
DECLARE STRING A$
DIM X(1,1),Y(1,1),X0(1,1),X1(1,1),Y1(1,1),COEFF(1,1)
NVARS=0
NDATA=0
DO
  PRINT
  PRINT TAB(20);"MULTIPLE LINEAR REGRESSION"
  PRINT TAB(20);"=========================="
  PRINT "0) QUIT"
  PRINT "1) KEYBOARD INPUT"
  PRINT "2) DATA STATEMENT INPUT"
  PRINT "3) FILE INPUT"
  PRINT "4) FILE OUTPUT"
  PRINT "5) CALCULATE REGRESSION"
  INPUT PROMPT "SELECT CHOICE BY NUMBER: ":C
  IF C=1 THEN
    INPUT PROMPT "NUMBER OF X VARS? ": NVARS
    INPUT PROMPT "NUMBER OF POINTS? ": NDATA
    NVARSP1=NVARS+1
    MAT REDIM X(NDATA,NVARSP1),Y(NDATA,1),X0(NVARSP1,NDATA)
    MAT REDIM X1(NVARSP1,NVARSP1),Y1(NVARSP1,1),COEFF(NVARSP1,1)
    PRINT "ENTER VALUES FOR INDEPENDENT VARIABLES MATRIX X (WITH FIRST COLUM AS ONES):"
    MAT INPUT X
    PRINT "ENTER VALUES FOR DEPENDENT VARIABLE Y:"
    MAT INPUT Y
    CALL TRSNF(X(,),Y(,),NDATA,NVARSP1)
  ELSEIF C=2 THEN
    WHEN ERROR IN
      READ NVARS, NDATA
      NVARSP1 = NVARS+1
      MAT REDIM X(NDATA,NVARSP1),Y(NDATA,1),X0(NVARSP1,NDATA)
      MAT REDIM X1(NVARSP1,NVARSP1),Y1(NVARSP1,1),COEFF(NVARSP1,1)
      PRINT "MATRIX X"
      MAT READ X
      MAT PRINT X
      PRINT
      PRINT "ARRAY Y"
      MAT READ Y
      MAT PRINT Y
      PRINT
      DATA 3,5
      DATA 1, 7, 25, 6
      DATA 1, 1, 29, 15
      DATA 1, 11, 56, 8
      DATA 1, 11, 31, 8
      DATA 1, 7, 52, 6
      DATA 60, 52, 20, 47, 33
      RESTORE
    USE
      PRINT "ERROR IN READING FROM DATA STATEMENTS";A$
      NDATA = 0
      NVARS = 0
    END WHEN
  ELSEIF C=3 THEN
    INPUT PROMPT "ENTER FILENAME? ":A$
    WHEN ERROR IN
      OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT
      INPUT #1: NVARS
      INPUT #1: NDATA
      NVARSP1 = NVARS+1
      MAT REDIM X(NDATA,NVARSP1),Y(NDATA,1),X0(NVARSP1,NDATA)
      MAT REDIM X1(NVARSP1,NVARSP1),Y1(NVARSP1,1),COEFF(NVARSP1,1)
      PRINT "MATRIX X"
      FOR I = 1 TO NDATA
        FOR J = 1 TO NVARSP1
          INPUT #1: X(I,J)
          PRINT X(I,J);
        NEXT J
        PRINT
      NEXT I
      PRINT
      PRINT "ARRAY Y"
      FOR I = 1 TO NDATA
        INPUT #1: Y(I,1)
        PRINT Y(I,1);
      NEXT I
      PRINT
      CLOSE #1
      INPUT PROMPT "TRANSFORM DATA? (Y/N) ":A$
      IF UCASE$(A$)="Y" OR UCASE$(A$)="YES" THEN
        CALL TRSNF(X(,),Y(,),NDATA,NVARSP1)
      END IF
    USE
      PRINT "COULD NOT OPEN OR READ FROM FILE ";A$
      NDATA = 0
      NVARS = 0
    END WHEN
  ELSEIF C=4 AND NVARS*NDATA>0 THEN
    INPUT PROMPT "ENTER FILENAME? ":A$
    WHEN ERROR IN
      OPEN #1: NAME A$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN
      ERASE #1
      PRINT #1: NVARS
      PRINT #1: NDATA
      FOR I = 1 TO NDATA
        FOR J = 1 TO NVARSP1
          PRINT #1: X(I,J)
        NEXT J
      NEXT I
      FOR I = 1 TO NDATA
        PRINT #1: Y(I,1)
      NEXT I
      CLOSE #1
    USE
      PRINT "COULD NOT OPEN OR WRITE TO FILE ";A$
    END WHEN
  ELSEIF C=5 AND NVARS*NDATA>0 THEN
    MAT X0=TRN(X)
    MAT X1=X0*X
    MAT Y1=X0*Y
    MAT X1=INV(X1)
    MAT COEFF=X1*Y1
    FOR I=1 TO NVARSP1
      PRINT "COEFF(";I-1;")=";COEFF(I,1)
    NEXT I
    SUMY=0
    SUMCX=0
    SUMY2=0
    FOR I=1 TO NDATA
      SUMY=SUMY+Y(I,1)
      SUMY2=SUMY2+Y(I,1)^2
    NEXT I
    FOR I=1 TO NVARSP1
      SUMCX=SUMCX+COEFF(I,1)*Y1(I,1)
    NEXT I
    R2=(SUMCX-SUMY^2/NDATA)/(SUMY2-SUMY^2/NDATA)
    PRINT "R^2=";R2
  ELSE
    IF C<>0 THEN PRINT "INVALID CHOICE"
  END IF
  
  IF C<>0 THEN
    PRINT "PRESS ANY KEY TO CONTINUE";
    GET KEY I
  END IF
LOOP UNTIL C=0
PRINT "END OF PROGRAM"
END

SUB TRSNF(X(,),Y(,),NDATA,NVARSP1)
! DATA TRANSFORMATION
END SUB
 

The subroutine TRNSF allows you to place any required data transformation statements. The current version of the code has that subroutine void of any executable statements. This means that the current multiple regression in strictly linear.

To perform a power regression for all the variables, for example, the subroutine TRNSF would look like:

SUB TRSNF(X(,),Y(),NDATA,NVARSP1)
! DATA TRANSFORMATION
FOR I = 1 TO NDATA
  Y(I,1)=LOG(Y(I,1))
NEXT I
FOR I = 1 TO NDATA
  FOR J = 2 TO NVARSP1
    X(I,J) = LOG(X(I,J))
  NEXT J
NEXT I
END SUB
 

BACK

Copyright (c) Namir Shammas. All rights reserved.