Skip to content

DC Data Catalog

In any SAS estate, it's unlikely the size & shape of data will remain static. By running a regular Catalog Scan, you can track changes such as:

  • Library Properties (size, schema, path, number of tables)
  • Table Properties (size, number of columns, primary keys)
  • Variable Properties (presence in a primary key, constraints, position in the dataset)
  • SAS Catalog Properties (number of entries, created / modified datetimes)
  • SAS Catalog Object properties (entry name, type, description, created / modified datetimes)

The data is stored with SCD2 so you can actually track changes to your model over time! Curious when that new column appeared? Just check the history in MPE_DATACATALOG_TABS.

The Catalog does not contain information about the data content (values). It is based primarily on the existing SAS dictionary tables, augmented with attributes such as primary key fields, filesize / libsize, and number of observations (eg for database tables).

Frequently changing data (such as nobs, size) are stored on the MPE_DATASTATUS_XXX tables. The rest is stored on the MPE_DATACATALOG_XXX tables.

Tables

Libraries

This table contains library level attributes to provide a high level overview of data coverage. Note that unless you are an administrator, you are unlikely to have the ability to view / open all of these libraries. To avoid errors when opening invalid libraries, you can add pipe-separated LIBREFs to the DCXXXX.MPE_CONFIG table (var_scope='DC_CATALOG', var_name='DC_IGNORELIBS').

Tables

Table attributes are split between those that change infrequently (eg PK_FIELDS) and those that change often (eg size, modified date, and NOBS).

Variables

Variable attributes come from dictionary tables with an extra PK indicator. A PK is identified by the fact the variable is within an index that is both UNIQUE and NOTNULL. Variable names are always uppercase.

Catalogs & Objects

This info comes from the dictionary.catalogs table. The catalog created / modified time is considered to be the earliest created time / latest modified time of the underlying objects.

Assumptions

The following assumptions are made:

  • Data Models (eg attributes) are not sensitive. If so the catalog tables should be disabled.
  • Users can see all tables in the libraries they can access. The refresh process will close out any tables that are not found, if the user can see at least one table in a library.
  • For a particular site, libraries are unique on LIBREF.

If you have duplicate librefs, specific table security setups, or sensitive models - contact us.

Refreshing the Data Catalog

The update process for INDIVIDUAL libraries can be run by any user, and is performed in the VIEW menu by expanding a library definition and clicking the refresh icon next to the library name.

Members of the admin group may run the refresh process for ALL libraries by clicking the REFRESH button on the System page.

When doing a full scan, the following LIBREFS are ignored:

  • 'CASUSER'
  • 'MAPSGFK'
  • 'SASUSER'
  • 'SASWORK
  • 'STPSAMP'
  • 'TEMP'
  • `WORK'

Additional LIBREFs can be excluded by adding them to the DCXXXX.MPE_CONFIG table (where var_scope='DC_CATALOG' and var_name='DC_IGNORELIBS'). Use a pipe (|) symbol to seperate them. This can be useful where there are connection issues for a particular library.

Be aware that the scan process can take a long time if you have a lot of tables!

Output tables (all SCD2):