This method for accessing the AACT database is recommended for SAS users wanting to run analyses or queries using the live database.
SAS software must be installed. SAS/ACCESS interface to PostgreSQL is required.
Note: These instructions were tested on SAS 9.4 (TS1M2) installed on a PC and on Linux.
If the steps described below do not work, the SAS/ACCESS interface to ODBC may need to be configured. For help, contact your SAS administrator, or refer to the SAS knowledgebase.
Using the LIBNAME statement in SAS, assign a libref (a shortcut name) to the PostgreSQL database, and specify the database connection options. The access credentials for the live AACT database are provided at the AACT Connect webpage.
libname cloud /* my libref */ postgres /* postgres engine */ user="aact" /* supply the AACT2 login credentials */ password="aact" database="aact" server="aact-prod.cr4nrslb1lw7.us-east-1.rds.amazonaws.com" port=5432 dbmax_text=32767 /* sets the length of long text variables */ preserve_tab_names=yes access=readonly ;
Note: In the PostgreSQL database, character variables are stored either as varchar(n) which is a ‘variable-length’ character type with a maximum limit on length, or as text character type which has unlimited length. SAS only has one type of character variable. For character variables of varchar(n) type, SAS will use the length limit assigned in the PostgreSQL database (default limit is n=255) for the variable. The dbmax_text= option in the libname statement is used to control the length assigned to character variables of text type. The maximum length available in SAS is 32767. Setting dbmax_text=32767 will minimize the risk of truncating information contained within these long variables. However, it may also result in excessively long variables that will increase storage and memory requirements, and reduce computational performance. When working with a particular dataset, the maximum character variable length may be set for that dataset using the dbmax_text data set option (see example in Step 3, below).
Refer to SAS documentation for additional PostgreSQL options and arguments for the LIBNAME statement.
If you are working in PC SAS, then after submitting the above LIBNAME statement you will be able to view the available database tables as SAS datasets using the Explorer window.
In these examples, the assigned libref is called ‘cloud’.
proc contents data=cloud._all_; run;
title 'Number of studies by type'; proc freq data=cloud.studies; tables study_type; run; title;
* - count number of records in STUDIES table; * - count number of records by STUDY_TYPE; proc sql; title 'Number of studies'; select count(*) from cloud.studies; title 'Number of studies by type'; select study_type, count(*) as Count from cloud.studies group by study_type; quit; title;
* !!! WARNING - this may take a long time !!!; * Consider extracting just a subset of records (e.g., 1000) first; * In the example below, the maximum length for character variables stored as ‘text’ type in the postgreSQL database is set at 1000; libname sasdata "<my file path>" ; /* folder for storing dataset */ data sasdata.studies; set cloud.studies (dbmax_text=1000 obs=1000); run;
Running this code in SAS will save a SAS dataset called ‘studies.sas7bdat’ in the file path specified by the libname sasdata … statement. This dataset will contain only the first 1000 records (obs=1000 argument) from the studies table in the live database.
* !!! WARNING – this may take a long time !!!; * For example, the following code took >2.5 h to run on PC SAS; * over a home internet connection; proc copy in = cloud out = sasdata ; select studies sponsors ; /* select a subset of database tables */ run;
Running this code in SAS will save two SAS datasets, one called ‘studies.sas7bdat’ and the second called ‘sponsors.sas7bdat’. These will be saved in the file path specified by the libname sasdata … statement.
For SAS users intending to use SAS PROC SQL to connect to and query the live AACT database, the following method may be used in place of the methods described in Step 2 and Step 3.
In general, the connection method described in Step 2 above that uses a SAS/ACCESS LIBNAME statement with postgres engine name for the PostgreSQL interface is recommended. However, in some situations, connection using the PROC SQL Pass-Through Facility may be needed. For a description of advantages and disadvantages of the two approaches, refer to this information provided by SAS.
The following example illustrates use of the Pass-Through Facility in SAS PROC SQL. The connect to statement establishes the connection to the live PostgreSQL database and provides the database access credentials; the disconnect from statement closes the connection. This first ‘select’ query returns the total count of studies in the database; the second ‘select’ query counts the number of studies by values of the study_type variable.
proc sql; connect to postgres as aact ( user="aact" password="aact" database="aact" server="aact-prod.cr4nrslb1lw7.us-east-1.rds.amazonaws.com" port=5432 ); title 'Number of studies'; select count(*) from connection to aact (select nct_id from studies); title 'Number of studies by type'; select study_type, count(*) as Count from connection to aact (select study_type from studies) group by study_type; disconnect from aact; quit; title;
You will need the following information to access the database directly in the cloud.
Hostname: aact-prod.cr4nrslb1lw7.us-east-1.rds.amazonaws.com Port: 5432 Database name: aact User name: aact Password: aact