ImmUniverse

Clinical Metadata Uploading#

Work packages 2,3,5 and 6 include the activity of harmonising and merging phenotype data, multi-omics data and information about biological samples for existing and upcoming patient and population cohorts within ImmUniverse. In order to facilitate this task, detailed metadata about the cohorts needs to be collected.

Data Dictionary#

What should be included in the data dictionary#

The data dictionary (sometimes also called a codebook) describes the variables available for a cohort. Information should include labels, detailed descriptions, the data type (numeric, text, date, ...), allowed values, etc.

A machine-readable version of your data dictionary (preferably as Excel or comma separated values (CSV) file) is required.

At a later stage, the variables of the data dictionary need to be matched to the ImmUniverse glossary (the OMOP common data model to be used for all cohorts, deliverable D4.2), which will allow us to transform the data and load it to ImmUniverse’s data and analysis platform.

How to create a new data dictionary#

An example of a data dictionary is available for downloading.

If you do not have a machine-readable data dictionary available, you can start creating one using this template (please do no forget to delete the examples contained in it).

The header cells in the template show detailed information (e.g how to fill the respective column) when selected. The same information is repeated below in the chapter Data Dictionary Template details.

How to modify an existing data dictionary#

If you have a data dictionary but you are not sure if the format is suitable, please get in touch with us (see Contact information) before spending time on converting it, we may be able to automate part of the process. This is especially true if your data dictionary contains many variables.

How to upload a data dictionary#

Visit ImmUniverse-REDCap and log in with your LUMS account. In case you have trouble logging in, please contact us.

In the REDCap subsection Data Collection Instrument: General Cohort Information, you can add the data dictionary file at Upload data dictionary according to the CRF.


Data Dictionary Template details#

The following describes the fields in the data dictionary template. The same information is available as tooltip of the header row in the template itself.

This field is only required if your cohort was prioritised. A category as defined by the ImmUniverse glossary team. Choose a category from the drop-down list.

The variable name or identifier (e.g. the name of the column in the database, the column header in an Excel sheet containing the data; it is not necessarily related to its content, for this use field label).

A reference to an external data model (e.g. ICD10, SDTM, …). If you use a standardised nomenclature to capture values, this is the place to refer to it.

The field or variable in the external data model if applicable (e.g. race_concept_id, value_as_number in OMOP)

Provide a date or identifier of the visit. In case the visit is stored in a different variable, refer to that variable. In longitudinal studies, the same observations exist multiple times. To distinguish these, the visit has to be specified.

The method used to ascertain the value. For example, a value may have been reported by a GP or a specialist, or it was taken from a questionnaire.

A subcategory of variables. Could be based on the table name of the underlying database, or some other means of categorising the variables (e.g. demographics, laboratory…).

A concise label of the variable. In contrast to variable/field name, it must be related to the recorded data (e.g. a variable that describes the age might have a variable/field name as qest01, while a field label as age).

Description to understand what exactly the variable refers to. Please provide the specific details such as definitions and description from the protocol if available.

Set to yes if the value is mandatory.

The type of the data. Supported are:

-   categorical

-   numeric

-   integer

-   float

-   date

-   time

-   datetime

-   string, text

Use string/text as little as possible.

Unit of measurement (e.g. m, mmHg,…)

Possible values the variable could take. Separate values by newline (press Alt-Enter to insert a newline in a cell).

Possible values the variable could take, and a mapping to its meaning. Separate values and meaning by colon, separate different mappings by newline (press Alt-Enter to insert a newline in a cell).

The format of the date or time. Example: yymmdd

Available are:

-   `cc` century

-   `yy` year in abbreviated form

-   `mm` month

-   `HH` hour

-   `MM` minutes

-   `ss` seconds

The maximal allowed number of characters.

A pattern the string should match.