Policies

I. General Terms of Use | II. Privacy Policy | III. Community Norms | IV. Data Usage Agreement | V. Digital Preservation | VI. Information Security | VII. Deaccessioning Data

I. General Terms of Use [1]

Texas Data Repository users shall abide by all applicable local, state, national, and international laws and usage agreement regulations, and in addition shall not:

  1. infringe any patent, trademark, trade secret, copyright, right of publicity of any other person or entity;
  2. be unlawful, harassing, defamatory, deceptive, fraudulent, invasive of another’s privacy;
  3. spam;
  4. introduce into the service software viruses or any other computer codes, files, or programs that are intended to disrupt the proper function of any software or hardware or that are intended to obtain unauthorized access to any system, data files or other information of the Texas Data Repository or any third party;
  5. impose an unreasonable or disproportionately large load on the Texas Data Repository (or its third party providers’) infrastructure (to be determined by the TDL in its sole discretion);
  6. interfere or attempt to interfere with the proper working of the service; or
  7. bypass any measures the TDL may use to prevent or restrict access to the service (other accounts, computer systems, or networks connected to the service).

In using the Texas Data Repository service, researchers and users represent that they will follow the:

A. General TDL Rules of Conduct

This is comprised of the Privacy Policy and Community Site Norms. As both a registered user and an unregistered guest, you are able to download publicly available content from the Texas Data Repository. You shall provide the Texas Data Repository with accurate and complete registration information and update such information to maintain its completeness and accuracy. Failure to do so may result in termination of your account. The Texas Data Repository reserves the right to refuse registration of or cancel an account.

You are solely responsible for activity that occurs on your account and shall be responsible for maintaining the confidentiality of your account password. You shall never use another user’s account without the other user’s permission. You will immediately notify the Texas Data Repository in writing of any unauthorized use of your account, or other account related security breach.

In having an account on the service, you acknowledge that:

  1. your account details (including first name, last name, email address, and username) are searchable by other users;
  2. your account details may be recorded when you download datasets, and that information can then be viewed by the owner of the user upload; and
  3. your first name, last name, and affiliation will be displayed in connection with your user uploads.

In your access to and use of an authentication key with the purpose of accessing or using, the Texas Data Repository API (whether through Dataverse’s own API application software or through a third-party application), you acknowledge that you have fully read and accepted the API Terms of Use.

1. Texas Data Repository Services

The service provides you with the ability to post user uploads and display, organize, accept and distribute researcher uploads by creating a Dataverse. The service provides you with the ability to post user uploads to other Dataverses administered by other users. The service also allows you to share datasets and Dataverses with other users by searching for their first name, last name, email address, or username.

The service also gives each administrator the ability to change dataset access/download restrictions by designating user uploads as restricted user submissions. Datasets are by default unpublished, but administrators can modify the access restrictions and publish or deaccession any user uploads at any time. Depending on the permissions granted by the administrator of the Dataverse in question, depositors may also change dataset access/download restrictions by designating user uploads as restricted user submissions.

The Texas Data Repository has no obligation to monitor the site, service, content, or user uploads. The Texas Data Repository may remove any User Upload at any time for any reason or for no reason at all.

You acknowledge that the Texas Data Repository does not endorse, take responsibility for, or make any representations or warranties for any user uploads, and will not be liable for:

  1. researcher uploaded content, format, metadata, or lack thereof;
  2. representations or warranties made by the researcher about uploads;
  3. any loss of or damage to Uploads, either in whole or in part, from whatever cause.

2. Restrictions

In contributing data to the site, you must ensure that the data complies with the terms of use. If your user upload does not comply with the terms of use, the Texas Data Repository has the right in its sole discretion to take down your User Upload. The Texas Data Repository does not review all user uploads before they are made available on the site, or before they are published. Therefore, you will be held legally and financially responsible for all damages if content you contribute violates anything in this agreement.

By posting user uploads to your Dataverse or other Dataverses, or by allowing others to do so, you make the following representations and warranties to the Texas Data Repository:

  1. user uploads do not infringe upon the copyrights or other intellectual property rights, including, but not limited to patent, trademark, trade secret, copyright, right of publicity or other right of any third party;
  2. user uploads do not violate any laws;
  3. in the event you become aware of any issues after submitting a User Upload, you will promptly notify Texas Data Repository and the relevant Dataverse administrator(s) of any confidentiality, privacy or data protection, licensing, or intellectual property issues regarding the User Uploads;
  4. user uploads do not contain software viruses or any other computer codes, files, or programs that are designed or intended to disrupt, damage, limit or interfere with the proper function of any software, hardware, or telecommunications equipment or to damage or obtain unauthorized access to any system, data files, or other information of Texas Data Repository or any third party;
  5. user uploads have been given all relevant, obligatory, and applicable approvals for posting such materials with the content included and in the format uploaded, including but not limited to approvals from the institutional review board and third parties with whom Users have relevant contractual obligations; and
  6. User uploads must be void of all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials (across datasets and Dataverses) uploaded under any one author and/or user should not be possible. Specifically, user uploads cannot contain social security numbers; credit card numbers; medical record numbers; health plan numbers; other account numbers of individuals; or biometric identifiers (fingerprints, retina, voice print, DNA, etc.). The only exceptions for when identifiable information is allowed are when:

a. the information has been previously released to the public;

b. the information describes public figures, where the data relates to their public roles or other non-sensitive subjects;

c. a sufficient length of time has passed since the collection of the information;

d. all identified subjects have given explicit informed consent allowing the public release of the information in the dataset; or

e. all identified subjects are decreased and no federal statute explicitly restricts the release of the data (this exception is only for federal records where data is created by a U.S. federal government agency or under a federal contract).

3. Licenses and Permissions to Use

You grant to Texas Data Repository all necessary permissions and required licenses to make the content you submit or deposit available for archiving, preservation and access, within the site. This includes, without restriction, permission to:

  1. re-disseminate copies of the content in a variety of distribution formats according to the standard terms of use of Texas Data Repository;
  2. promote and advertise the content in any publicity (in any form) for Texas Data Repository;
  3. describe, catalog, and document the user submissions;
  4. store, translate, copy or re-format the content in any way to ensure its future preservation and accessibility, and improve usability and/or protect respondent confidentiality; and
  5. incorporate metadata or documentation in the content into public access catalogues.

You represent and warrant that you are lawfully entitled and have full authority to license to Texas Data Repository the content you submit or deposit in the ways described in these terms of use; and you are not under any obligation or restriction created by law, contract or otherwise that would prevent you from entering into and fully performing these terms of use.

None of the above supersedes any prior contractual obligations with third parties that require any information to be kept confidential. Nothing in this agreement obligates you to disclose information to Texas Data Repository if such information is otherwise confidential or proprietary. Texas Data Repository does not approve user uploads before they are posted; therefore, you are solely responsible for the user submissions you post on or through the service and all possible confidentiality or other privacy issues that may arise from your posting any user uploads.

4. Researcher Submission Data Usage License Agreement

You acknowledge that Texas Data Repository’s default data usage license agreement for all uploaded materials is a Creative Commons 0 (CC0) License. For more information, please visit CC0 1.0 Universal (CC0 1.0)  Public Domain Dedication Full Legal Text (https://creativecommons.org/publicdomain/zero/1.0/).

Users also have the option of drafting a custom data usage license agreement. By choosing to draft a custom data usage license agreement for a particular dataset, users represent that:

  1. they have written themselves or have permission to use the language of the data usage license agreement they choose;
  2. the data usage license agreement covers all restrictions and protections they wish to retain and will not rely on the site to provide any further protections or restrictions;
  3. hey are solely responsible for ensuring the data usage license agreement is legally sound and that the site is not responsible for anything included; and
  4. nothing in the data usage license agreement conflicts with, supersedes, or limits any prior contractual obligations on the part of the User, any third parties, downloaders, or the site.

Users also have the option of choosing to use Texas Data Repository’s restricted data usage license agreement (“Section IV Data Use Agreement”). By choosing to use the data use agreement, users acknowledge and agree that:

  1. Texas  Data Repository owes no obligation or responsibility and makes no representations with regards to the legality, enforceability, accuracy, or desirability of the data use agreement;
  2. Texas Data Repository is not a party to the data use agreement and cannot be held accountable for any terms found within the data use agreement; and
  3. Texas Data Repository has no obligation to aid or support either party of the agreement in the execution or enforcement of the data use agreement’s terms. Users are responsible for establishing, maintaining, and enforcing the license terms they wish to use for access to and use of user uploads. Texas Data Repository is not responsible for any inaccuracies, unenforceable terms, or liabilities that may arise from choosing any of the options afforded in this agreement, and Texas Data Repository will not be responsible for reviewing or enforcing compliance of any terms the User may choose to employ.

B. Downloading Researcher Submissions

The site represents that it will use all reasonable efforts to maintain open access to datasets for users to download, subject to depositors’ restrictions and any applicable legal restrictions. The site collects and stores download data from each download for all users (both registered and guest), which can then be downloaded and accessed by the depositor. Downloaders represent that, in downloading any material from the site, they:

  1. have read and understood the site’s Community Norms
  2. will abide by the applicable data usage license agreement attached to the dataset;
  3. acknowledge that their account information (for users) or temporary site identification information (for guests) may be recorded upon download, which can then be viewed by the owner of the user upload; and
  4. have done their due diligence in ensuring that they do not download and use any datasets or other materials where prohibited by applicable law.

C. Termination of Services

Texas Data Repository may terminate your access to all or any part of the service at any time, with or without cause, with or without notice. If you wish to terminate your account, you may notify TDL or your local administrator at (support@tdl.org).

All provisions of the terms of use which by their nature should survive termination shall survive termination, including, without limitation, ownership provisions, warranty disclaimers, indemnity and limitations of liability.

D. Texas Data Repository Warranties

The service (including, without limitation, all content and user uploads) is provided “as is” and “as available” and without warranty of any kind, express or implied, including, but not limited to, the implied warranties of title, non-infringement, merchantability and fitness for a particular purpose, and any warranties implied by any course of performance or usage of trade, all of which are expressly disclaimed. Without limiting the foregoing, Texas Data Repository does not warrant that:

  1. the content or user uploads are timely, accurate, complete, reliable or correct in their posted forms on the service
  2. the service will be secure
  3. the service will be available at any particular time or location
  4. any defects or errors will be corrected
  5. the site, content or any user uploads are free of viruses or other harmful components
  6. the results of using the service will meet your requirements. Your use of the service and any content is solely at your own risk.

E. TDL Limitation of Liability

In no event shall Texas Data Repository and its affiliates, or their directors, employees, agents, partners, or suppliers, be liable under contract, tort, strict liability, negligence or any other legal theory with respect to the service or any content or user submissions (i) for any direct damages, or (ii) for any lost profits or special, indirect, incidental, punitive, or consequential damages of any kind whatsoever.

F. Indemnification

You will indemnify and hold TDL harmless from and against any and all loss, cost, expense, liability, or damage, including, without limitation, all reasonable attorneys’ fees and court costs, arising from:

a. your use or misuse of the service;

b. your access to the site;

c. your violation of the terms of use;

d. infringement by you, or any third party using your account, of any intellectual property or other right of any person or entity, including but not limited to infringements upon any and all representations made by you in this agreement.

Such losses, costs, expenses, damages, or liabilities shall include, without limitation, all actual, general, special, and consequential damages.

G. Dispute Resolution

You and TDL agree that any cause of action arising out of or related to the service must commence within one (1) year after the cause of action arose; otherwise, such cause of action is permanently barred. These Terms of Use shall be governed by and interpreted in accordance with the laws of the state of Texas (excluding the conflict of laws rules thereof). All disputes under these Terms of Use will be resolved in the applicable state or federal courts of Texas. You consent to the jurisdiction of such courts and waive any jurisdictional or venue defenses otherwise available.

H. Integration and Severability

This agreement is the entire agreement between you and Texas Data Repository with respect to the service and use of this site, and supersedes all prior or contemporaneous communications and proposals (whether oral, written or electronic) between you and Texas Data Repository with respect to this site (but excluding the use of any third-party software, widgets, and applications that may be subject to a separate end-user license agreement). If any provision of the terms of use is found to be unenforceable or invalid, that provision will be limited or eliminated to the minimum extent necessary so that the terms of use will otherwise remain in full force and effect and enforceable.

I. Miscellaneous

Texas Data Repository may assign, transfer or delegate any of its rights and obligations hereunder without consent. No agency, partnership, joint venture, or employment relationship is created as a result of the terms of use and neither party has any authority of any kind to bind the other in any respect outside the specified terms of this agreement. In any action or proceeding to enforce rights under the terms of use, the prevailing party will be entitled to recover costs and attorney’s fees.

II. Privacy Policy [2]

This privacy policy explains what information TDL collects through your use of the Texas Data Repository application and how we treat that information. By using the Texas Data Repository application, you acknowledge and accept that these are the privacy practices governing the Texas Data Repository. This web site may contain links to other web sites and use third-party applications and/or software. We are not responsible for the privacy practices of these third parties, and you should read through their practices before clicking or using them.

A. Information Provided by Researchers

Information Voluntarily Provided by You

When you register for an account with Texas Data Repository, we collect your name, email address, (and optionally) institution and position.

Additionally, if the guestbook feature is activated for a particular dataset, then any information filled out in the guestbook is made available to the Texas Data Repository administrator, dataset manager, and dataset curator.

Information Collected Through Your Use of the Texas Data Repository Application

When you access this web site, our web server software generates log files of the IP address of your computer. These web server logs are retained on a temporary basis and then deleted from our systems. When you download a file from Texas Data Repository, our software collects user account data such as your name, username, email, institution and position if provided (or the session ID data for guest users) and accompanying download data such as the time of the download. This information is then made available to the Dataverse administrator, dataset manager, and dataset curator of the file.

B. Use of Information and Data

We use your IP address and files you access to help diagnose problems with our server and to administer our web site by identifying:

  1. which parts of our site are most heavily used
  2. which portion of our audience comes from within the TDL network

We also use this information to tailor site content to user needs and to generate aggregate statistical reports. We do not disclose site usage by individual users.

With the exception of the data collected during downloads, which are made available only to the file owner, we do not share any personally identifiable information we gather or develop about our users to any third parties for any purpose unless required by law. Any reports we may share externally would use unidentifiable, aggregated data.

C. Cookies

We use cookies to maintain a user’s identity between web sessions.

D. Security

This site has security measures in place to protect the loss, misuse and alteration of the information under our control. See Section VI Information Security for more detailed outlines.

E. Changes to this Privacy Policy

TDL may revise this privacy policy at its sole discretion. Please check this page regularly for our current practices. If you have any questions about this privacy policy, the practices of this site, or your dealings with this site, you can contact: support@tdl.org

III. Community Norms

A. Creative Commons Zero (CC0) Designation [3]

The Texas Data Repository uses the CC0 option. This option lets others distribute, remix, tweak, and build upon your work, even commercially. “The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.” (Further information: https://creativecommons.org/about/cc0/)

B. Crediting Research Data with Citations [4]

TDL asks that all users who download datasets from the Texas Data Repository adhere to the following Community Norms. Any materials (books, articles, conference papers, theses, dissertations, reports, and other such publications) created that employ, reference, or otherwise utilize the data (in whole or in part) gathered from deposited datasets should credit the source with the applicable data citation generated by the Texas Data Repository (found on the dataset page). These citations include the data authors, data identifier, and other information in accordance with the Joint Declaration of Data Citation Principles (https://www.force11.org/group/joint-declaration-data-citation-principles-final) for all research data.

Data Citation Principles [6]

Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and scholarly record. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.
The following Data Citation Principles cover purpose, function and attributes of citations:

  1. Importance: Data should be considered legitimate, citable products of research
  2. Credit and Attribution: Data citations should facilitate giving scholarly credit and legal attribution to all contributors to the data.
  3. Evidence: In scholarly literature, the corresponding data should be cited.
  4. Unique Identification: A data citation should include a persistent method for identification that is globally unique.
  5. Access: Data citations should facilitate access to the data, associated metadata and other materials to make use of the referenced data.
  6. Persistence: Unique identifiers, and metadata should persist — even beyond the lifespan of the data they describe.
  7. Specificity and Verifiability: Data citations should facilitate identification of the specific data that support a claim.
  8. Interoperability and Flexibility: Data citation should be flexible but should not compromise interoperability.

C. Maintaining Anonymity of Human Subjects

Users of the TDL service should not abuse the available data that relate to human subjects and use the materials to:

  1. obtain information that could directly or indirectly identify any research subjects, or obtain information to attempt to identify any research subjects
  2. produce and/or publish connections among datasets that could identify individuals or organizations; or
  3. obtain (additional) information about or (additional) means of contact for already-identified subjects.

D. Third Party API Applications

If you are interested in building an API application designed (exclusively or not) to allow and provide access to Texas Data Repository and its materials and services, please keep in mind that such applications:

  1. must ensure that all users of the application read and agree to both the API Terms of Use and General Terms of Use;
  2. must post both the API Terms of Use and General Terms of Use in an adequately noticeable and conveniently-accessible place so that all users of the application can easily find and view it
  3. acknowledge and agree that Texas Data Repository is not otherwise affiliated with any third-party Dataverse API applications that provide access to Texas Data Repository, and therefore will not be held liable (in whole or in part) for any suits or damages incurred by the third-party Dataverse API application owners, administrators, and affiliates; and
  4. must clearly publish its third-party non-affiliation status in relation to Texas Data Repository in its application and disclaim any special relationship with Texas Data Repository outside of the ones arising out of the obligations agreed upon from the API Terms of Use and the General Terms of Use agreements.

IV. Data Usage Agreement [7]

This is an agreement (“Agreement”) between you the downloader and the owner of the data governing the use of the data and related materials to be downloaded.

A. Acceptance of the Data Usage Agreement

By downloading or otherwise accessing the Dataverse, downloader represents his/her acceptance of the terms of this agreement.

B. Use of Data

Use of the data and materials include but are not limited to viewing parts or the whole of the content including; comparing data or content from the materials with data or content in other datasets; verifying research results with the content included here; and extracting and/or appropriating any part of the content here for use in other projects, publications, research, or other related work products.

C. Representations and Warranties

In use of the data and materials, downloader represents that:

  1. Downloader is not bound by any pre-existing legal obligations or other applicable laws that prevent downloader from downloading or using the Materials;
  2. Downloader will not use the data in any way prohibited by applicable laws;
  3. Downloader has no knowledge of and will therefore not be responsible for any restrictions regarding the use of the data beyond what is described in this agreement; and
  4. Downloader has no knowledge of and will therefore not be responsible for any inaccuracies and any other such problems with regards to the content of the data and the accompanying citation information.

Restrictions in his/her use of the materials, downloaders cannot:

  1. obtain information from the materials that results in downloader or any third party(ies) directly or indirectly identifying any research subjects with the aid of other information acquired elsewhere;
  2. produce connections or links among the information included in user’s datasets (including information in the materials), or between the information included in user’s datasets (including information in the materials) and other third-party information that could be used to identify any individuals or organizations.

The data is provided “as is” and “as available” and without warranty of any kind, including, but not limited to, non-infringement, merchantability and fitness for a particular purpose, and any warranties implied by any course of performance or usage of trade, all of which are expressly disclaimed.

Without limiting the foregoing, researchers who upload datasets do not warrant that:

  1. the materials are accurate, complete, reliable or correct
  2. the material files will be secure
  3. the materials will be available at any particular time or location
  4. any defects or errors will be corrected
  5. the materials and accompanying files are free of viruses or other harmful components;
  6. the results of using the materials will meet downloader’s requirements. downloader’s use of the materials is solely at downloader’s own risk.

D. Limitation of Liability

In no event shall researchers be liable under contract or any other legal theory with respect to the data (i) for any direct damages, or (ii) for any lost profits or special, indirect, incidental, punitive, or consequential damages of any kind whatsoever.

E. Indemnification

Downloader will indemnify and hold uploaders of datasets harmless from and against any and all loss, cost, expense, liability, or damage, including, without limitation, all reasonable attorneys’ fees and court costs, arising from the:

  1. Downloader’s misuse of the Materials;
  2. Downloader’s violation of the terms of this agreement; or
  3. Infringement by downloader or any third party of any intellectual property or other right of any person or entity contained in the materials.

Such losses, costs, expenses, damages, or liabilities shall include, without limitation, all actual, general, special, and consequential damages.

F. Dispute Resolution

Downloader and user agree that any cause of action arising out of or related to the download or use of data must be completed within one (1) year after the cause of action arose; otherwise, such cause of action is permanently barred. This agreement shall be governed by and interpreted in accordance with the laws of the state of Texas. All disputes under this agreement will be resolved in the applicable state or federal courts of Texas. Downloader consents to the jurisdiction of such courts and waives any jurisdictional or venue defenses otherwise available

G. Integration and Severability

This agreement represents the entire agreement between downloader and researchers with respect to the downloading/uploading and use of data, and supersedes all prior or contemporaneous communications and proposals between downloader and researcher. If any provision of this agreement is found to be unenforceable or invalid, that provision will be limited or eliminated to the minimum extent necessary so that the agreement will otherwise remain in full force and effect and enforceable.

H. Miscellaneous

No agency, partnership, joint venture, or employment relationship is created as a result of the Agreement and neither party has any authority of any kind to bind the other in any respect outside of the terms described within this Agreement.

V. Digital Preservation and Security

A. Policy

Purpose

It is the mission of the TDL to enable digital initiatives in support of research, scholarship, and learning in Texas. As a part of this mission, the TDL endeavors to collect, preserve, and disseminate scholarly materials for the benefit of both producers and consumers of academic research and scholarship. The TDL’s instance of the Dataverse Network, encompassing each of the dataverses of its member institutions, is the digital resource intended to address a consortium-level need for publishing, managing, and providing access to research-generated data sets. The following Digital Preservation Policy describes the extent to which the TDL will support sustainable access to the digital research data and related content deposited in the Texas Data Repository.

The preservation objectives of the Texas Data Repository are:

  • to collect, preserve, and disseminate the data sets and related information generated by researchers affiliated with any of the TDL’s member institutions who choose to deposit their content therein.
  • to enable researchers affiliated with any of the TDL’s member institutions to comply with the mandates of funding agencies to manage, preserve, and share their research data.
  • to provide the means for users to discover and access the data sets and metadata generated by academics affiliated with any of the TDL’s member institutions over the long term.

Part of the TDL’s vision in establishing a consortium Dataverse is to make research materials freely available to anyone, anywhere, and at any time. The TDL is an advocate for Open Access to scholarly work and the incentives to researchers for publishing and preserving their research data in the Texas Data Repository are:

  • data that might be precariously stored on fragile, random, or unsustainable storage devices can be securely preserved for the long term.
  • data that might otherwise become neglected over time can be preserved and made accessible for other interested researchers to use and cite, potentially providing wider visibility and impact for the research.
  • many funding agencies and scholarly journals require data management plans that detail how the data will be managed, made accessible, and preserved.

B. Scope

The TDL accepts the responsibility to preserve and provide access to research data, including associated metadata and documentation that is properly deposited in the Texas Data Repository. This responsibility includes the provision of digital means to preserve and ensure ongoing access to said content for a minimum period of ten years after it is deposited in Dataverse.  Long-term preservation of Dataverse content, beyond the ten-year retention period, is subject to the TDL’s selection criteria, appraisal of the content, and budgetary and technical support of resources necessary to meet this goal. Metadata for content removed from Dataverse, regardless of reason or retention period, may be preserved for an undetermined period of time after said content’s removal.

The Texas Data Repository content will be selected and appraised according to the following preservation priorities and levels of commitment:

  1. Research data associated with publications – great effort will be made to ensure the long-term preservation of data associated with journal or scholarly publications, so long as the data meets the TDL collection policies and the Texas Data Repository remains the data’s hosted or cited repository.
  2. Stand-alone data publications with high research value – reasonable effort will be made to ensure the long-term preservation of data and metadata of stand-alone publications that library professionals identify as having high research value to the broader academic community.
  3. Other data files and materials – efforts may or may not be made to retain ephemeral materials considered to lack significant or long-term value, although particular files may be preserved on a select basis as appropriate.

Additionally, the Texas Data Repository will accept data submissions of any format, but only provides full support (i.e. data exploration, analysis, and meta-analysis via the TwoRavens suite of statistical tools) to tabular data preferably in the following formats:

  • SPSS (POR and SAV formats)
  • STATA
  • R data
  • CSV

These files can be in compressed ZIP format at ingest, however, they may not exceed two GB in size.  Please see http://guides.dataverse.org/en/latest/user/tabulardataingest/index.html and http://guides.dataverse.org/en/latest/user/dataset-management.html for more specific information on data set and metadata formats.

Texas Data Repository provides basic, bit-level preservation through fixity checks and secure backup of deposited content. Further and more in-depth digital preservation activities and services must be provided by a digital preservation program at the institution where the research data was originally generated.

C. Strategic Plan

The TDL has an official backup strategy that requires all digital content to be:

  • copied nightly with versioning and kept for one year (individual files)
  • copied nightly as a snapshot and kept for one month (entire service)

The TDL systems also provide security services key to basic digital preservation, namely access control, network monitoring and protection, encryption, and system updates (see Information Security Policy). There are currently no limitations to the overall quantity of data that can be stored on TDL servers, only limitations on the size of individual files (2 GB) uploaded via the Texas Data Repository application.

Procedures

Dataverse best practices for data management and preservation include:

  • automatic extraction of metadata from tabular files and FITS
  • standard descriptive metadata schemas such as OAI DC, DDI (for statistical and social science), ISA-Tab (for biomedical), FITS (for astronomy)
  • re-formatting of tabular data to simple open format text files
  • data and metadata versioning; database maintenance
  • checksum generation upon ingest (UNF for tabular data, MD5 for other files)
  • persistent URL –  DOI (minted by EZID)
  • deaccessioning of data, but not citation metadata, if necessary

The TDL systems infrastructure includes bit-level fixity checking via Amazon S3 host service.

References

The Dataverse Project, “Harvard Dataverse Preservation Policy,” http://best-practices.dataverse.org/harvard-policies/harvard-preservation-policy.html

Purdue University Research Repository (PURR), “PURR Digital Preservation Policy,” https://purr.purdue.edu/legal/digitalpreservation

Texas Digital Library, “Our Mission and Vision,” https://www.tdl.org/strategic-plan/vision/

Preserving digital Objects With Restricted Resources, “Tool Grid,” http://digitalpowrr.niu.edu/tool-grid/

Digital Curation Centre, “DataVerse,” http://www.dcc.ac.uk/resources/external/dataverse

Harvard Dataverse, “UCLA Social Science Data Archive Dataverse,” http://dataarchives.ss.ucla.edu/archive%20tutorial/archivingdata.html

Harvard’s Institute for Quantitative Social Science (IQSS), “About TwoRavens,” http://datascience.iq.harvard.edu/about-tworavens

University of North Carolina – The Odum Institute, “Digital Preservation Policies,” http://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=629

Harvard Dataverse Project, “User Guide: Tabular Data File Ingest,” http://guides.dataverse.org/en/latest/user/tabulardataingest/index.html

Elizabeth Quigley, IQSS-Harvard University, “The Expanding Dataverse,” http://dataverse.org/files/dataverseorg/files/introduction_to_dataverse.pdf?m=1447352697

VI. Information Security

Information security is a complex and vital element of maintaining any information system. There are issues that threaten information security and they are generally associated with the areas of systems security, data integrity, and regulatory and legal considerations. Vulnerabilities in web applications, internal processes, and authentication account for most threats to an organization’s information assets. These threats need to be constantly addressed and vulnerabilities continuously remediated.

The TDL actively addresses the need to ensure the accuracy, integrity, authenticity, and permanence of the digital content that it manages, as well as the security of the services and platforms that it provides. The TDL ensures the security of its Dataverse instance as follows:

A. System Security

The TDL systems and services are hosted with Amazon Web Service (AWS), which provides cloud security services and support (https://aws.amazon.com/security/) to include:

  • Secure Network Architecture – segmentation and firewalls throughout
  • Secure Access Points – API endpoints allowing HTTPS access
  • Encryption – connections encrypted by SSL
  • Network Monitoring and Protection – against DDoS and MITM attacks, IP spoofing, etc.
  • Identity Management and Authentication – secure log-in via password and SSH key pair

Additionally, the TDL updates its Operating Systems (OS) quarterly at a minimum, and immediately when important security patches are made available.

B. Data Integrity

The TDL has an official backup strategy that requires all digital content to be stored in three distinct locations for all services including the Texas Data Repository. TDL will retain:

  1. the copy of the data residing on the production server (currently an EBS volume),
  2. nightly snapshots that can be used to restore the entire service to a particular date within the preceding month,
  3. a copy of all data files, made nightly with versioning and kept for one year, stored on Amazon S3 (https://aws.amazon.com/s3/); these copies can be used to restore individual files, but not the entire service.

Although the TDL does not curate or conduct preservation planning on content within the Texas Data Repository, it provides some lower-level services to help ensure the integrity of the data it hosts. In addition to the access control and network protection mentioned in the previous section, the AWS S3, where the Texas Data Repository is hosted, performs regular systematic data integrity checks and is built to be self-healing. Also, the TDL ensures the accurate migration and/or transfer of data between storage spaces, servers, and systems wherever such may become necessary.

C. Regulatory and Legal Considerations

The TDL requires Texas Data Repository contributors to remove, replace, or redact identifying confidential or sensitive information from datasets prior to upload. The Texas Data Repository will not serve this function and takes no responsibility for the inadvertent release of restricted and protected data. Users should contact the TDL and alert them to any data placed into TDL storage and/or infrastructure that requires FERPA, HIPAA, or other federal privacy standards. The TDL can offer dark storage options outside of the Texas Data Repository service for such instances.

The Texas Data Repository complies with Texas Administrative Code (TAC) 206.70 as set forth in the University of Texas Web Accessibility Policy (http://www.utexas.edu/cio/policies/web-accessibility).

References

Texas Digital Library Data Security Policy, April 2015, https://tdl.org/wp-content/uploads/downloads/2015/04/Texas-Digital-Library-Data-Security-Policy.pdf

Texas Digital Library Data Management Talking Points, November 2013, http://tdl.org/wp-content/uploads/downloads/2013/11/datamgmt-talking-points-11.19.2013.pdf

University of Texas Web Accessibility Policy, 23 March 2015, http://www.utexas.edu/cio/policies/web-accessibility

Amazon AWS Cloud Security, website, https://aws.amazon.com/security/

Amazon AWS Identity and Access Management (IAM), website, https://aws.amazon.com/iam/

Amazon Web Services: Overview of Security Processes, August 2015, https://d0.awsstatic.com/whitepapers/Security/AWS_Security_Whitepaper.pdf

Digital Preservation Coalition: Information Security, website, http://www.dpconline.org/advice/preservationhandbook/technical-solutions-and-tools/information-security

VII. Deaccessioning Data

Items may be deaccessioned from the repository for the following reasons:

  • copyright violation
  • legal requirements and proven violations
  • national security
  • falsified research
  • confidentiality concerns etc.

Items may also be deaccessioned from the repository by the depositor. Deaccessioning a dataset or a version of a dataset is a very serious action that should only occur if there is a legal or valid reason for the dataset to no longer be accessible to the public. If you absolutely must deaccession, you can deaccession a version of a dataset or an entire dataset. To deaccession, go to a dataset you’ve already published (or add a new one and publish it), click on Edit Dataset, then Deaccession Dataset. If you have multiple versions of a dataset, you can select here which versions you want to deaccession or choose to deaccession the entire dataset. You must also include a reason as to why this dataset was deaccessioned from a dropdown list of options. There is also a free-text box to add more details as to why this was deaccessioned. If the dataset has moved to a different repository or site you are encouraged to include a URL (preferably persistent) for users to continue to be able to access this dataset in the future.

Important NoteA tombstone landing page with the basic citation metadata will always be accessible to the public if they use the persistent URL (Handle or DOI) provided in the citation for that dataset. Users will not be able to see any of the files or additional metadata that were previously available prior to deaccession.

Should a dataset be removed by either the repository or the depositor, TDL reserves the right to retain its citation metadata record in the repository as trace of the dataset. Additionally, the citation metadata of withdrawn items will be searchable.

References

DISC-UK DataShare Project, “Policy-making for Research Data in Repositories: A Guide,” https://www.coar-repositories.org/files/guide.pdf

Dataverse Project, “User Guide: Dataset + File Management,” http://guides.dataverse.org/en/latest/user/dataset-management.html

Footnotes
1. These General Terms of Use are adapted from Harvard Dataverse generic best practices templates created for these purposes. For original, see: http://best-practices.Dataverse.org/harvard-policies/harvard-terms-of-use.html)
2. The Privacy Policy is adapted from Harvard Dataverse best practices generic templates created for these purposes. For the original, please see: http://best-practices.Dataverse.org/harvard-policies/harvard-privacy-policy.html
3. Adapted from https://creativecommons.org/publicdomain/zero/1.0/
4. Adapted from the Data Citation Synthesis Group, “Joint Declaration of Data Citation Principles”: https://www.force11.org/group/joint-declaration-data-citation-principles-final
5. The Texas Data Repository Community Norms are adapted from Harvard Dataverse best practices templates created for these purposes. For original templates, please see http://best-practices.Dataverse.org/harvard-policies/community-norms.html. Important modifications to this section include more extensive use of the Joint Declaration of Data Citation Principles.
6. Adapted from Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 [/datacitation].
7. The Data Usage Agreement is adapted from the Harvard best practices templates created for these purposes. For original template, please see http://best-practices.Dataverse.org/harvard-policies/sample-dua.html