The Faculty Research Data Management Incentivising Programme Fund was established in 2023 by the Committee on Research Data Management of the University’s Research Committee to promote good research data management practice to faculty members and students, engage them in research data management events, and nurture their skills throughout the research life cycle.  Faculty Representatives of each Faculty or their delegates are invited to be the principal investigators.

The Faculty Research Data Management Incentivising Programme Fund supported eight projects with a total of HKD 800,000 being awarded.  Details of the projects are as follows:

 

Faculty of Arts
Title Workshops on Research Data Management for the Humanities
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator YIP Choy Yin Virginia Department of Linguistics and Modern Languages
Co-investigator MAI Ziyin Department of Linguistics and Modern Languages
Abstract The project aims to promote good research data management (RDM) practice and engage researchers and students in RDM promotion events that target departments in the Faculty of Arts (FoA). As research data is recognized as a valuable institutional asset for research excellence, it is important for researchers in FoA to have a better understanding of the value of RDM and good practices in order to give them an edge and equip them to compete for external grants.

Compared to other faculties, RDM at CUHK has been underutilized in FoA so far. To raise the awareness of the importance of RMD and make use of the facilities provided by CUHK, we will organize workshops to promote the value and importance of RDM for research excellence and offer practical help to use RDM among departments in FoA which encompasses diverse disciplines including Anthropology, Chinese, Cultural and Religious Studies, English, Fine Arts, History, Japanese Studies, Linguistics, Philosophy and Translation. We will highlight the availability of different resources and support from CUHK to help researchers to develop good research data management skills.

The deliverables include the organization of workshops dedicated to the promotion of RDM in FoA and the intended outcome is to boost the effective use of RDM. A number of workshops will be organized featuring speakers who have made successful use of RDM. Sharing sessions will present successful cases that demonstrate the effectiveness of RDM which will in turn facilitate FoA colleagues’ understanding and use of RDM. The key performance indicators include the attendance and participation rate of faculty members and research personnel in the workshops and participants’ ratings of the workshop activities and goals. Here’s the tentative timeline:
Term 2 March 2024  Workshop 1
Summer  June 2024   Workshop 2
Term 1 September 2024 Workshop 3
Term 2 March 2025 Workshop 1
Summer June 2025 Workshop 2
Term 1 September 2025 Workshop 3

Participants in the workshops will be asked to give feedback on the usefulness of the presentations. We will keep track of the utilization rate of RDM among workshop participants and check whether and how they benefit in terms of enhancing the quality of their research and overall management of datasets.

To evaluate the action taken by colleagues and students in deployment of RDM as a result of attending the workshops, the Faculty will ask the Library to provide a quarterly report on the number of DMPs registered by the Faculty in DMP Tool and the number of datasets logged in the CUHK RDR. This is a way to assess the effectiveness of the workshops in increasing the deployment of RDM. It is anticipated that the utilization rate of RDM at CUHK by Faculty of Arts will rise compared to the pre-workshop period and will continue to increase incrementally. We’ll also engage the Faculty Data Champion in supporting the organization and promotion of the workshops.

Research staff consist of part time Research Associate, undergraduate and postgraduate students will be recruited to support the organization and publicity of the workshops. They will also provide practical help to Arts Faculty members who send inquiries about Research Data Management in general.

A report at the end of the project will discuss the activities of the workshops conducted from January 2024 to December 2025 and evaluate the success and areas to be addressed in the future.

Start Date 2-Jan-2024
End Date 31-Dec-2025
Faculty of Business Administration
Title Business Research Data Acquisition and Maintenance
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator WU Jing Department of Decisions, Operations, and Technology
Abstract The detailed introduction and description of data management work is listed below separately. First, the project aims to clean and structure the two new databases, i.e., UNComtrade and US Patent. These databases are much pertinent to my current research on global supply chains and technology improvement. I describe the two databases and how I will manage these datasets below.

(a) UNComtrade
UN Comtrade is a comprehensive international trade database maintained by the United Nations Statistics Division. It provides detailed information on global merchandise trade, including import and export data reported by countries worldwide. UN Comtrade covers a wide range of commodities, countries, and years, making it a valuable resource for analyzing international trade patterns and trends.

To retrieve the UNComtrade data, our team plans to access the UN Comtrade database through their online portal or API service. Select the appropriate data parameters and download the data files in a suitable format, such as CSV or Excel. Then we clean the data, including handle missing values, standardize variables, address data anomalies, and aggregate the data.

(b) US Patent
The United States Patent and Trademark Office (USPTO) provides a vast and comprehensive collection of patent data, making it a fundamental resource for researchers, innovators, and businesses. This data encompasses a wide array of information, including patent filings, granted patents, citations, inventors’ details, locations, and technical specifications.

Organizing the patent dataset involves a series of meticulous processes. Firstly, we need to download data from the official USPTO website, ensuring we have the most recent and comprehensive datasets. To enhance its utility, we will employ advanced fuzzy matching algorithms to align firms within the USPTO dataset with other commonly used data sources. Subsequently, a meticulous manual review of the matching results follows, allowing us to eliminate discrepancies and refine the dataset further. We will then organize it into a structured relational database which serves as one of the foundations of our research.

In this funding, we plan to apply the data into business research. Using these datasets, we can explore business topics related to geopolitics and global supply chain restructure, reshoring, and innovation coordination along supply chains, etc.

Second, my team has successfully acquired and cleaned more than 20 databases, to enhance usability, this proposal also aims to store all databases into SQL format. After that, I will organize two workshops within the Faculty of Business Administration. At the first workshop, we will invite staff from CUHK Library to introduce the RDM service and the DMP writing platform on DMPTool and CUHK Research Data Repository. At the second workshop, we will share the insights into managing research databases efficiently, including introducing the datasets, displaying the structured data, inspiring potential research ideas, etc.

I describe the detailed steps below.

Data Structuring and Storage
The project aims to streamline the usage of the acquired databases by structuring and saving them into SQL format. By doing so, we will enable convenient access and retrieval of relevant data for different research purposes. This structured approach will ensure that researchers do not need to repeatedly handle raw, unstructured, and large-scale datasets. Instead, they will have access to organized databases and constructed variables.

Documentation and Variable Description
To facilitate seamless utilization of the structured databases, we recognize the importance of providing comprehensive documentation. For each database, we will prepare detailed documentation that describes the data cleaning procedures and outlines the variables available for direct usage. This documentation will serve a valuable resource for researchers, offering insights into the data’s quality, limitations, and potential applications.

Data Integrity
To prevent data tampering, we will back up the data with very limited access.

Research Data Management Workshops
The project will deliver two workshops within the Faculty of Business Administration. At the first workshop, we will invite staff from CUHK Library to introduce the RDM service and the DMP writing platform on DMPTool and CUHK Research Data Repository. The Library can generate statistical report on DMP and Repository usage. During the workshop, we will display the report and the statistics to the audience, aiming to improve the awareness on RDM in the faculty and encourage RDM practice in research. We can also spread the report via Wechat, LinkedIn, etc.

At the second workshop, we will share the insights into managing research databases efficiently, including introducing the datasets, displaying the structured data, inspiring potential research ideas, etc. Specifically, we will share the applications (mainly Python and SQL) used to clean and storage the datasets and the basic coding skills. We will exemplify the steps of the whole data management process at the workshop.

These workshops will provide researchers with the knowledge and skills necessary to manage research data. We will cover various aspects of data management, including potential research questions that can use the datasets, how we acquire the datasets, data cleaning techniques, documentations, and data security and storage. By empowering researchers with this knowledge, we aim to foster a culture of responsible and efficient research data management within the faculty.

After each workshop, we will conduct the evaluation from workshop participates. Specifically, we will send an online questionnaire to collect evaluations and feedback, including the RDM awareness, the content, delivery, and the overall satisfaction of the workshop.

Data Sharing
We plan to share the cleaned datasets of UNComtrade and US Patent in SQL format within the Faculty of Business so that the data can be accessible and reusable.

Expected timeline.
Jan 2024-June 2024 Acquire and clean datasets
July 2024-Dec 2024 Data structuring and storage into SQL format
Jan 2025-Aug 2025 Construct the most frequently used variables and prepare documentation.
We plan to organize the first workshop in Sept. 2024 and the second workshop in Jan. 2025.

Start Date 1-Jan-2024
End Date 31-Aug-2025
Faculty of Education
Title Research Data Management and Application in Education
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator OU Dongshu Department of Educational Administration and Policy
Co-investigator TEO Timothy Department of Educational Psychology
Abstract The funding will be mainly used to support a series lunchtime seminars/workshops on research data management. We will invite regional/international* guest speakers to share their expertise in research data management and application in Education. Each Department is welcome to suggest speakers and the PI and Co-I will select the speakers from the pool of suggestions with the support of one full-time staff at the Faculty of Education.

We expect to host no more than eight seminars during the period of January 2024 – December 2025 including workshop(s) introducing the development of RDM at CUHK offered by the Library team. Before seminars, Faculty members will sign up for one-to-one consultation service with the guest speakers who might offer their expertise after the workshop at a reserved deskspace at Faculty of Education.

Evaluation and feedback of the workshops and consultation service will be collected online including the relevance of the topic, whether the workshop/consultation has increased the participants’ knowledge of RDM. We will also demographic/background questions so to understand how the workshop/consultation have benefited different groups of faculty members/students. We will also include some feedback questions to learn more about your participants and their opinions. Actions of improving the workshops will be adopted when feasible (for example considering more introduction/introductory workshops if some topics/sharing of speakers are too advanced etc.). We will further seek for help from the Library in terms of usage on CUHK Research Data Repository and DMP service by the Faculty on half-yearly or annual basis, which will enable the Faculty to consider the influence of workshops and trainings. Lastly, we will collaborate with the Data Champion (DC) in our faculty to promote RDM in faculty via these workshops and build up a network among our faculty members and graduate students in RDM.

November 2023-December 2023: 1st Call for nominations of data experts.
January 2024 – June 2024: 2 lunch workshops
July 2024 – August 2024: 2nd Call for nominations of data experts
September 2024 – June 2025: 4 lunch workshops
July 2025 – August 2025: 3rd Call for nominations of data experts
September 2025 – December 2025: 2 lunch workshops

*Please note that our proposed budget does not cover the travel or accommodation fees for international speakers. These speakers might be supported by other funding resources such as Department/Faculty supportive fund or research grants.

Start Date 1-Jan-2024
End Date 31-Dec-2024
Faculty of Engineering
Title Research Data Management for Engineering
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator MENG Helen Mei Ling Department of Systems Engineering and Engineering Management
Abstract Research Data Management is of growing importance.  While data has been vital in developing models in Engineering research, good research data management practice is equally important for the data to be reused for further research. This project aims at raising the awareness of Research Data Management (RDM) and its practice as well as promoting the RDM service and infrastructure at CUHK to staff and students in the Faculty of Engineering.

Activities to be organized in Faculty of Engineering:
1. Workshop on Basics of Research Data Management (Workshops will be recorded for a wider audience from Faculty of Engineering.)
Intended outcome: To transfer knowledge on standard RDM principles and knowledge to faculty members and students in Faculty of Engineering.
KPI: End of workshop evaluation will be conducted to review the effectiveness of the workshops.

2. Seminar on the development of research data services at CUHK, offered by the CUHK Library (Workshops will be recorded for a wider audience from Faculty of Engineering.)
Intended outcome: To boost the effective use of Research Data Services at CUHK by Faculty of Engineering members.
KPI: Survey will be taken after the seminars to review the intention of members in Faculty of Science in using research data service at CUHK.

3. Training sessions on RDM practice, including creating Data Management Plans (DMPs) and data deposit at the CUHK Research Data Repository by CUHK Library and guest speakers
Intended outcome: Researchers and students in Faculty of Engineering will create Data Management Plans at proposal stage and share their data in public data repository at the end of research.
KPI: The Faculty will invite the CUHK Library to provide semi-annual usage reports on DMP creation and data deposit at CUHK Research Data Repository by Faculty of Engineering researchers

4. Invited talks on RDM for Engineering and the development of infrastructure on RDM
Intended outcome: Researchers at Faculty of Engineering understands how the field of Engineering supports the development of research data management globally.
KPI: Evaluations will be conducted to review how the talks influence the RDM practice and the research directions of faculty members and students at Faculty of Engineering.

Besides the above listed events, supports from the Data Champion(s) in the Faculty of Engineering and the CUHK Library will guide researchers to practice research data management in their research and course works.

Timeline:
Jan 2024–Apr 2024: Workshops on Basics of Research Data Management
May 2024–Sep 2024: Seminars on development of research data service at CUHK Sep 2024–Dec 2024: Training sessions on RDM practices
Jun 2024–Dec 2025: Invited talks on RDM

Start Date 1-Jan-2024
End Date 31-Dec-2025
Faculty of Law
Title Workshop on Empirical Legal Research and Data Management
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator CHENG Kevin Kwon Yin Faculty of Law
Abstract At the Faculty of Law, most research by colleagues would not involve the use the data. However, there is a growing number of colleagues that do engage in empirical legal research that would generate data. This would include quantitative data collected from court cases of different legal jurisdictions, data from international agreements, and qualitative data from interviews with professionals and stakeholders. In short, perhaps compared with other Faculties, we have less experience in data management since we do not use data as much as others.

With that said, the current Faculty Research Data Management Incentivizing Programme Fund provides our faculty with a good opportunity to develop our expertise in this vital area. We propose to host one to two workshops that invite speakers from outside of CUHK to discuss matters relating to the collection of legal data and data management. The idea is to host workshops that are specifically related to our field – legal research – and not simply general workshops on data management. We propose to combine discussions of empirical legal research, the collection of legal data, the potential impact of such data (such as creating a court database), how to manage such data, and relating privacy/intellectual property issues in these workshops.

Within the workshop(s), one of the main features would be to have a case study in law (such as the PI’s own research) to go through research data management (RDM) practices through the research life cycle. The PI’s own project builds an original dataset through coding the Hong Kong Judiciary’s ‘Reasons for Sentence.’ The dataset captures sentencing factors, namely aggravating and mitigating factors, and sentence decisions across the three offences of drugs trafficking, assault, and burglary. An individual application has been submitted for this project under the Research Data Management Development Fund. If both applications are funded, the PI’s project shall be used as a guide to share the RDM aspects of the research, in particular the PI’s data management plan using the CUHK DMP template and the dataset that will be uploaded to the CUHK Research Data Repository.

In addition, speakers will be invited to discuss RDM practices within the research life cycle tailored for legal studies including, the creating and finding of the data (such as from court cases and other sources), organizing, storing and preserving the data (using the 3-2-1 backup rule). Importantly, attendees will be invited to utilize the RDM tools made available by CUHK for their own research such as the aforementioned CUHK DMP template and CUHK Research Data Repository. Faculty members and students are unlikely to be familiar with the terminologies in research data management, the proposed workshop(s) rectifies this gap. Discussions about the FAIR principles, intellectual property and the Personal Data (Privacy) Ordinance (Cap. 486) would be of interest to legal scholars as well.

For these workshops, all staff, in particular staff that engage in empirical legal research, all RPg students, and all postdoctoral students and research assistants will be invited to attend. That is essentially everyone in the Faculty of Law. Subsequently, all researchers will be reminded to embed RDM practices into their research proposals and research projects in research committee meetings, in GRF/ECS applications and progress reports for RPg students.

The workshop(s) held in the faculty will also promote research data management events at the University and elsewhere. That is why having external speakers is beneficial.

The proposed timeline is to conduct these one to two workshops in 2024 or 2025. The reason for the longer timeline is to give colleagues at the Faculty an opportunity to experiment with research data management on their own first, perhaps through the Research Data Management Development Fund, so that they can share their experiences and challenges in these workshops. The objective is to have colleagues engage in the workshops and not simply to listen to others’ experiences.

The key performance indicator is participation of this workshop or workshops by Faculty members including full-time staff and RPg students. Of course, the goal is to encourage deployment of RDM practice in future projects in the Faculty of Law. The proposed activities here in this application is an important first step in that direction.

Start Date 1-Jan-2024
End Date 31-Dec-2025
Faculty of Medicine
Title Workshops on Research Data Management for Patient and Healthcare Data
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator YIP Terry Cheuk-Fung Department of Medicine and Therapeutics
Co-investigator WONG Grace Lai-Hung Department of Medicine and Therapeutics
Abstract The Faculty of Medicine recognizes research data as a valuable resource for achieving research excellence. For studies involving patient data, ensuring data security is crucial to safeguard patient privacy. Therefore, staff and students should understand the importance of research data management (RDM) and adhere to established best practices. A better understanding will give researchers a competitive edge and enhance their ability to secure external grants. Medical Data Analytics Centre (MDAC; https://mdac.cuhk.edu.hk/) was established in March 2020, affiliated with the Department of Medicine and Therapeutics, Faculty of Medicine. With the growing trend of utilizing large-scale healthcare data for research, MDAC is devoted to supporting our colleagues in managing advanced data analysis while upholding good data management practices and safeguarding data privacy.

The proposed project aims to improve the awareness of staff and students in the Faculty of Medicine on the significance of RDM, promote effective RDM practices, engage researchers from various departments in the Faculty of Medicine, and bring attention to different available resources and support from CUHK to assist researchers to cultivate good RDM skills. The expected activities to be organized by MDAC and the timeline are as follows.
Activities Timeline
Consultation services on good data management for individual projects by MDAC January 2024 to December 2025
Webinars to introduce RDM principles and concepts and available research data service at CUHK (on-demand videos of recorded webinars available) March 2024
Engagement activities – online quizzes with incentives (e.g., online book coupons) May 2024 and May 2025
Hands-on workshops for creating data management plans (DMPs) with the DMPTool (https://dmptool.org) June 2024 and June 2025
Seminars on proper RDM skills related to healthcare data – Dos and Don’ts in healthcare research data management September 2024 and September 2025
Workshops on storage, access, quality, and security of healthcare data – upholding patient privacy and maintaining data integrity March 2025

The participants will be encouraged to evaluate the workshops and provide feedback on the values of the presentation. Also, we will monitor the utilization rate of good RDM practices among participants and assess the extent of improvements in research quality and data management of the participants through the projects.

To assess the impact of attending the workshops and seminars on the implementation of RDM, the Faculty plans to ask the CUHK Library to provide a report every 6 months. This report will include the number of DMPs with DMPTool registered by the Faculty and the number of datasets logged in the CUHK Research Data Repository (RDR). It is expected that the Faculty of Medicine will have an increase in the utilization of RDM compared to the period before the workshops and seminars, with a gradual and continuous growth. Additionally, we have contacted the Faculty Data Champion members and they will be involved in supporting the organization and promotion of the workshops. Research assistants will be recruited to help organize and promote the workshops and seminars, and set up online quizzes as engagement activities. They will also assist Faculty members who consult us for good data management for individual projects.

Upon the completion of the project, a final report will be written to evaluate the workshop activities conducted between January 2024 and December 2025. This report will comprehensively assess the achievements of the workshops, examining their effectiveness and identifying areas that need to be prioritized for future improvements and development.

Start Date 2-Jan-2024
End Date 31-Dec-2025
Faculty of Science
Title Workshops for the Implementation and Promotion of Research Data Management in Faculty of Science
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator NGAI To Department of Chemistry
Abstract The Faculty RDM Fund aims to address the critical need for proper management and preservation of research data generated by members at the Science Faculty, with a focus on covering RDM concepts and subject disciplinary RDM practices, encompassing a vast of Faculty Academic Units including Life Sciences, Chemistry, Mathematics, Physics, Statistics and Earth & Environmental Sciences. Additionally, the proposal introduces the research data service at CUHK Library and outlines activities that will promote RDM practice to a wide audience within the Faculty.

This proposal is aligned with the RDM policy and guidelines of The Chinese University of Hong Kong (CUHK), as outlined in the following resources:
– CUHK RDM Policy and Guidelines:
[https://www.lib.cuhk.edu.hk/sites/cuhk/files/page/research/Data/RDM_Guidelines-Approved.pdf]
– CUHK Research Data Repository:
[https://www.lib.cuhk.edu.hk/en/research/data/data-repository]
– CUHK RDM LibGuide:
[https://libguides.lib.cuhk.edu.hk/rdm]

Objective:
The primary objective of the Faculty Research Data Management Fund is to promote RDM practice among faculty members in a way that aligns with subject disciplinary RDM practices. By offering training workshops, participating in overseas conference, and the synergy between the centralised service and Faculty Data Champion Scheme, we aim to enhance RDM literacy and enable adoption of RDM practice among researchers and students (UG & PG), leading to improved data management and sharing in future projects.

Funding deliverables:
Faculty RDM funding will be utilised for the following purposes:
(1) Workshops covering RDM Concepts and Subject Disciplinary Good Practices to promote the value and importance of RDM for research excellence
– Faculty Members from Physical Science and Biological Science panels who have made successful use of RDM will be invited to share at the workshops to demonstrate the significance and effectiveness of RDM and how RDM enhance the academic impact of their research. Separate workshops will be tailored to the specific needs of different disciplines (biological and physical sciences) within the Faculty
– Promotion: Faculty Webpage, Massmail to Units
– Mandatory nomination from all Units to ensure a wide spectrum of participants across the Faculty to facilitate the promotion of RDM at the Faculty.
– Workshops will be offered in hybrid mode to maximise the engagement to researchers and students
– Tentative Timeline: (i) Biological Science RDM Workshop: June 2024, Feb 2025; (ii) Physical Science RSM Workshop Nov 2024, Oct 2025

(2) Workshops to introduce research data service at CUHK
– We will actively seek collaboration with CUHK Library to re-run the RDM workshop at the Faculty to leverage their expertise and CUHK resources in RDM.
– Promotion: Faculty Webpage, Massmail to Units
– Mandatory nomination from all Units to ensure a wide spectrum of participants across the Faculty to facilitate the promotion of RDM at the Faculty.
– Workshops will be offered in hybrid mode to maximise the engagement to researchers and students
– Tentative Timeline: Feb to Mar 2024

(3) Research Data Service
– Collaborating with the Data Champion selected by the Library, student helpers and research assistants will be recruited to support the organisation and publicity of the workshops. They will also provide practical help to answer general enquiries about Research Data Management raised by Faculty members.
– Student helpers will also be appointed by the CRDM Faculty representative to help implement the RDM initiatives within the Faculty
– Tentative Timeline: Service will be available throughout the funding period (Jan 2024 to Dec 2025)
For general evaluation of the implementation of RDM practice at the Faculty, we will keep track of the RDM usage by requesting the Libra1y to provide a quarterly report on the number of DMPs registered by the Faculty in DMP Tool and the number of datasets logged in the CUHK RDR. KPls of the Deliverables for the workshops include: (i) Attendance and pa11icipation rate of faculty members and research personnel; (ii) Participants’ ratings on the workshops activities and effectiveness.

(4) Conference attendance
– The funding will support the Faculty representative of Committee on Research Data Management (CRDM) to attend overseas research conference or research trip to gain experience on Research Data Management experience in other institutes.

A report at the end of the project will evaluate the activities of the workshops conducted during the funding period, discuss the implementation and identify further improvement of RDM practice at Science Faculty.

Start Date 1-Jan-2024
End Date 31-Dec-2025
Faculty of Social Science
Title CUHK Social Media Data Repository
Principal Investigator/
Co-investigator(s)
Name Affiliation
Principal Investigator VAN AMEIJDE Jeroen School of Architecture
Co-investigator ZHU Ling Department of Sociology
Abstract In an era of ever-increasing user-generated content through online social media platforms, exciting new modes of interdisciplinary research can help expand upon existing theories of human behavior.

Within the Faculty of Social Science (FSS), the Computational Social Science Lab (CSSL) platform focuses on important social, environmental, and public health challenges through research and PhD training. In addition to the CSSL, an increasing number of scholars in the Faculty is engaged with data-driven research projects on a broad array of topics such as social stratification, social mobility, crime and deviance, gender issues, transport and mobility behavior, and urban quality of life.

The grant application presented here proposes to support and grow this community by developing an open-access online repository of social media data sets, which are processed from raw data collections into specifically formatted data formats. As obtaining meaningful information from online data sources is a complex and time-consuming challenge, the repository can enable researchers to conduct fast-tracked and high-level data analytics.

Several well-known universities worldwide provide pre-processed resources to make social media data accessible and useful to researchers, for instance the Social Media Archive (SOMAR) at the University of Michigan. The CUHK Social Media Data Repository (SMDR) could provide regionally specific data or data sets relating to context specific research projects. Similar to SOMAR, data can be provided as ‘open access’ or made available after approval following a submitted restricted data application. The data sets can be the outcome of data previously collected and curated by researchers from within the FSS, as well as affiliated scholars around the world.

In addition to creating this new repository, the initiative will include an internal publicity campaign and a series of short workshops aimed at researchers, postgraduate students, and professorial staff, to introduce the project and promote good research data management (RDM) practice as a valuable facet in striving for research excellence.
The workshops will be open to students and staff from across the FSS and will offer an introductory overview to various methods of social media-based social science research, as well as case study project examples and step-by-step guides on data analysis and management. They will highlight the availability of different resources and support from CUHK to help researchers develop good research data management skills.

As part of this grant application, it is proposed to engage a Research Assistant to work on developing the first social media data sets which will be offered for open access. Future follow-up grant applications can also be used to secure additional funding for expansion of the Repository with up-to-date data sets and continue the promotion of good data management across a wide range of students and staff in the Faculty.

(i) Expected activities:
The PI, CO-I and associated PhD students will liaise to determine suitable content selections and formatting requirements for a series of data sets to be produced during the course of the project. The project Research Assistant will conduct data gathering, processing and cataloguing work and prepare promotional materials and workshop presentations. The project team will liaise with key CUHK Library personnel to coordinate online publication of the data resources.

Three short workshops will be offered during the project period, approximately on 20 September and 4 + 18 October 2024.

(ii) Deliverables and key performance indicators:
– 3-4 data sets relevant and applicable to a wide range of Social Science research purposes, collected from large-scale social media platforms such as X (formerly Twitter), Facebook, Instagram, as well as Weibo (China-specific data). The data sets are processed to remove personal information including usernames and profiles.
– Coordination and promotion of the project amongst at least 25 faculty affiliates from 4 different schools and departments in the FSS, to establish topics of interest, survey current data-driven research methods and establish data set publication formats and protocols.
– Development of a project-specific web-page (simple format), to be integrated in the existing FSS and CSSL website frameworks. The website serves to inform interested researchers about the potential of data-driven social science research, promoting the mission of open access to research data for advancing our understanding of social processes and societies.
– Participants in the workshops will be asked to give feedback on the usefulness of the presentations. An additional survey will be conducted 6 months after the workshop dates, to evaluate whether and how the participants have been able to engage with data-driven research and qualitatively improve their RDM practices.
– To evaluate the deployment of RDM after accessing the Repository and/or attending the workshops, the project team will request reports from the CUHK Library on the number of Data Management Plans (DMP) registered by FSS staff and students in the DMP Tool. It is anticipated that the utilization rate of RDM by FSS members will rise compared to the pre-workshop period. We’ll also engage the Faculty Data Champions in supporting the organization and promotion of the workshops.

(iii) Timeline for completing the activities and achieving the deliverables/outcomes
– January – March 2024 – recruitment of a suitable research assistant with a background in data science or computer science.
– March – April 2024 – project scope definition, content selection and project production planning.
– May – August 2024 – collection and production of processed data sets, data set testing and finalization.
– September – October 2024 – promotion and organization of the three workshops.
– (in parallel) April – December 2024 – website content development and publication.
– April 2025 – participant survey and project report production.

Start Date 1-Jan-2024
End Date 1-May-2025