Clinical Metadata Uploading#
Work packages 2,3,5 and 6 include the activity of harmonising and merging phenotype data, multi-omics data and information about biological samples for existing and upcoming patient and population cohorts within ImmUniverse. In order to facilitate this task, detailed metadata about the cohorts needs to be collected.
Data Dictionary#
What should be included in the data dictionary#
The data dictionary (sometimes also called a codebook) describes the
variables available for a cohort. Information should include labels,
detailed descriptions, the data type (numeric
, text
, date
, ...
), allowed
values, etc.
A machine-readable version of your data dictionary (preferably as Excel or comma separated values (CSV) file) is required.
At a later stage, the variables of the data dictionary need to be matched to the ImmUniverse glossary (the OMOP common data model to be used for all cohorts, deliverable D4.2), which will allow us to transform the data and load it to ImmUniverse’s data and analysis platform.
How to create a new data dictionary#
An example of a data dictionary is available for downloading.
If you do not have a machine-readable data dictionary available, you can start creating one using this template (please do no forget to delete the examples contained in it).
The header cells in the template show detailed information (e.g how to fill the respective column) when selected. The same information is repeated below in the chapter Data Dictionary Template details.
How to modify an existing data dictionary#
If you have a data dictionary but you are not sure if the format is suitable, please get in touch with us (see Contact information) before spending time on converting it, we may be able to automate part of the process. This is especially true if your data dictionary contains many variables.
How to upload a data dictionary#
Visit ImmUniverse-REDCap and log in with your LUMS account. In case you have trouble logging in, please contact us.
In the REDCap subsection Data Collection Instrument: General Cohort Information, you can add the data dictionary file at Upload data dictionary according to the CRF.
Data Dictionary Template details#
The following describes the fields in the data dictionary template. The same information is available as tooltip of the header row in the template itself.
- ImmUniverse category
This field is only required if your cohort was prioritised. A category as defined by the ImmUniverse glossary team. Choose a category from the drop-down list.
- variable/field name
The variable name or identifier (e.g. the name of the column in the
database, the column header in an Excel sheet containing the data; it is
not necessarily related to its content, for this use field label
).
- reference to external data model
A reference to an external data model (e.g. ICD10
, SDTM
, …). If
you use a standardised nomenclature to capture values, this is the place
to refer to it.
- variable name in external data model
The field or variable in the external data model if applicable (e.g.
race_concept_id
, value_as_number
in OMOP
)
- visit/collection
Provide a date or identifier of the visit. In case the visit is stored in a different variable, refer to that variable. In longitudinal studies, the same observations exist multiple times. To distinguish these, the visit has to be specified.
- ascertainment method
The method used to ascertain the value. For example, a value may have been reported by a GP or a specialist, or it was taken from a questionnaire.
- table name/file name
A subcategory of variables. Could be based on the table name of the
underlying database, or some other means of categorising the variables
(e.g. demographics
, laboratory
…).
- field label
A concise label of the variable. In contrast to variable/field name
,
it must be related to the recorded data (e.g. a variable that describes
the age might have a variable/field name
as qest01
, while a field label
as age
).
- field description
Description to understand what exactly the variable refers to. Please provide the specific details such as definitions and description from the protocol if available.
- required
Set to yes
if the value is mandatory.
- type
The type of the data. Supported are:
- categorical
- numeric
- integer
- float
- date
- time
- datetime
- string, text
Use string/text as little as possible.
- unit
Unit of measurement (e.g. m
, mmHg
,…)
- allowed values
Possible values the variable could take. Separate values by newline (press Alt-Enter to insert a newline in a cell).
- allowed value mapping
Possible values the variable could take, and a mapping to its meaning. Separate values and meaning by colon, separate different mappings by newline (press Alt-Enter to insert a newline in a cell).
- datetime pattern
The format of the date or time. Example: yymmdd
Available are:
- `cc` century
- `yy` year in abbreviated form
- `mm` month
- `HH` hour
- `MM` minutes
- `ss` seconds
- field size
The maximal allowed number of characters.
- regular expression
A pattern the string should match.