Tereza Motalová
works as a research and data management methodologist at UPOL. She also works as a data manager in two research projects. She successfully completed the data steward course in Vienna in the academic year 2023/24.
Foto: UPOL
Dealing with data is very individual, it goes differently in every field, but it always has to make sense, according to data stewards Tereza Motalová and Martin Schätz.
A data steward is perhaps a profession that is hard to imagine for most people. However, data stewardship has become a big and necessary trend in recent years. It is a crucial element in the National Repository Platform. This is where research data is stored and published in a controlled manner on a pilot basis - simply so that scientists can easily examine what others have already discovered and can build on the work that has already been done. And how to become a data steward? A good way to do this is to sign up for a course. One of the most renowned ones has been organized by the University of Vienna since 2022. Since then, dozens of candidates worldwide have passed it, including Tereza Motalová from Palacký University in Olomouc. And Martin Schätz from VŠCHT in Prague has taken other, this time Czech courses focused on data stewardship. Together they tried to summarize everything they know about this profession and described their path to becoming data stewards. The fact that the whole issue of storing and sharing research data is in its infancy was evidenced by the fact that they themselves were curious about each other's experience.
Martin
I've been thinking about this question because I've registered for more courses and attended more courses myself. For me, the most important one in Europe is probably the one in Vienna. It is a two-semester course that costs EUR 3000, but I have not taken it.
Tereza
I did that one.
Martin
We'd complement each other so nicely! The one I took was not directly a data steward course but a Data Stewardship course created as part of the DocEnhance grant. The latter is more aimed at educating early career scientists so that they can master the basic tasks associated with data storage. I asked so vehemently about the data stewardship stuff that the university said, "We don't know, here's the course, sign up." Then again, I asked so much that they said, "You know more than we do, come straight in and teach." So, I took my first course in 2021 as both a teacher for creating a data management plan and as a student. This year will be the fourth or fifth year at UCT. It's going to be ramping up now, and we're taking our data stewards from that course. The first year they're going to take it, the second year they're going to participate as a teacher, and then we'll train them in the hands-on part. But it's unlike in Vienna, where you get a certificate when you complete the course. Is that right?
Tereza
That's right, yes.
Martin
Official paper, I like that. The other course I know about, which I'm pleased about, is our Czech Data Steward course under the Faculty of Arts of the CU. I take that as very beneficial to our environment and hope it will continue. I have also done the Train the Trainer (FAIRsFAIR) course, where we were taught how to train and think about creating training for data stewards. That was in 2021 and it was free at the time. Because of a huge boom in Europe at the time. However, I think this course is still accessible on the European EOSC Moodle platform.
Tereza
In terms of my journey, it was pretty wild. As part of a project running here at Palacký University, there was suddenly a need for follow-up projects. That is, writing project applications and applications with the idea that we wanted to target the Horizon Europe research and innovation program. This opened a big, in quotes, Pandora's box called Open Science and Research Data Management. So, my journey was from project administrator to data manager or, rather, Open Science coordinator. Materials and courses at that time are still plentiful today. It's just that knowing what's suitable for whom is difficult. I missed something focused on the data steward, as the position is generally referred to. I remember stumbling across this course in Vienna, but the price put me off. Aha! Well, I guess that's not going to be passable for me. In my case, fortunately, the information about this course was later renewed by Matyáš Hiřman, who attended it for Charles University. And I happened to be involved in two projects from Horizon Europe, which allowed me to finance the course. By being in charge of data management, and still am, I was able to attend the course.
“The other course I know about, which I'm pleased about, is our Czech Data Steward course under the Faculty of Arts of the CU. I take that as very beneficial to our environment and hope it will continue.”
Tereza
Every year they have a period during which it is possible to apply and then they choose who to take on the course. I took it in the 2023/24 academic year. The course is based on five core modules, representing different areas that every data steward should know. The first module focuses on the basics of Open Science and Research Data Management, outlining the most basic topics. The second module introduces the basics of IT and data science, covering the fundamentals of databases, programming, versioning, and the Unix shell. From my point of view, the third module was the most essential, which focuses on FAIR data throughout its lifecycle, from planning, organizing the data, processing, and documenting it through storage and long-term preservation, as well as how to take care of the data in terms of security, personal data, to publishing it or possibly reusing it, including legal aspects. It also included sub-modules that focused on the social sciences, humanities, engineering sciences, and natural sciences, which gives a peek at how it can work in different fields. The penultimate module focused on the training, service and support that a data steward provides to researchers. The last module is the project. You finalize the course by choosing your own topic, choosing a supervisor, and agreeing on that, and the output is supposed to be maybe a course concept. At the same time, you write a report on your project. As far as assessment is concerned, it's not that you go through a test after each module, but most modules are based on assignments, where you are given a task or a specific assignment to hand in by a particular deadline. There's a range of certain points that you can get, and based on the points from the assignments and the project, an overall grade is generated and you get a certificate. This will be the only certificate course currently available in the European area.
“The course is based on five core modules, representing different areas that every data steward should know.”
Tereza
The course is for two semesters. The first module runs for a week in Vienna, where you're physically there and get to know the environment and the other participants, so it's not entirely anonymous. The other modules then run online two days a month. It works out to Thursdays and Fridays.
Martin
The DocEnhance course doesn't go into that much depth, but what one goes through is similar. It is divided into three parts, the first part is self-study, where the student learns about FAIR principles, what Open Science is, and how to publish data. They take a small quiz on each part. At the end of the first module, he takes an online exam and receives an official certificate of passing the knowledge. The second module is practical. For example, participants are tasked to build a Data Management Plan based on their research and find out why to publish or not publish data, dealing with licensing or archiving. The third module is aimed at the commercial sector. We will approach some companies and they will show us how they work with data and what it can do for them. The graduate of the course then has a broader idea of where data management is relevant. It's much quicker, you can do all three modules in a semester just fine. However, the course graduate is not a finished data steward. Rather, it's someone with more in-depth experience with data governance and Open Science and an idea of the demands in their specific discipline. And always with a focus on the field they are currently in.
Tereza
I took the course as part of my position. I attended the course in Vienna as part of a business trip, and the online course took place during working hours. I needed the extra time to work on the assignments and the project. Combining this with work commitments wasn't always possible, so some evenings and weekends fell for this.
Martin
The data stewardship course is designed so that one should manage it alongside something else. But it also depends on knowledge. My first module took three afternoons, but it could have been two weeks for someone else. As for module two, it has six parts and took me about three hours of my time a week.
Tereza
I'll clarify that with the assignments, it definitely wasn't just one assignment per module. The module consisted of five to sixteen sub-modules and there were several assignments as well. Some were more straightforward, some more complex. The third module was made up of sixteen parts and five assignments to be handed in, so it was quite a challenge.
“The data stewardship course is designed so that one should manage it alongside something else. But it also depends on knowledge.”
Tereza
Vienna started in the first half of October, and the classes ran until June, including the handing in of assignments. The deadline for project submission was the third week of July. In September, I received a message that we had received a grade for the project and that the certificate would be sent to me in early October. So yes, you could say that it takes a year to complete the Vienna course.
Martin
If I were to estimate how long it would take me to get to that position as from the Vienna course but without the Vienna course, I would see it as about a year and a half. The other courses are focused a bit differently, so you still need to get some practical experience.
Martin
The best way for a data steward who works for an entire institution is to take a course in Vienna. He needs to have an overview of multiple disciplines and, in fact, of everything that is done at the university. Whereas the other course, which we can call Introduction to Data Stewardship, will be more suited for a faculty and team data steward by being focused specifically on what a scientist does. But we'll have to educate the faculty one a little bit more.
Tereza
I see it similarly. I recommend the Vienna course to someone who downright wants to work in that position. He can sit both at the headquarters and at the faculty. For me, it was more complicated. But even though I took the course from a position as data manager for two research projects, I found it suitable for my job. For me personally, the course helped me to get more established in the field, and as my journey went quite organically and wildly, I finally gained some confidence with this course that I was going in the right direction.
Martin
I see it as terribly important that the service delivery part is in Vienna. Just learning how to behave in that position and what to offer is essential in this job, and my course doesn't provide that.
“The best way for a data steward who works for an entire institution is to take a course in Vienna. He needs to have an overview of multiple disciplines and, in fact, of everything that is done at the university.”
Martin
There is such a huge shortage of courses at the moment that everyone who has found they need something like this is trying to get into courses. Anyone with even a marginal interest will probably get beaten up a lot by everyone who really needs a course. But I do see positively that it's moving forward, new courses are opening up, there's a community of data stewards started by Matyas Hiřman from the UK, it's moving forward.
Tereza
If I had to summarise the composition of my classmates, they mostly all dealt with research data management at their institution in some way. Someone was directly from the data steward office or from the data steward network that operates in Vienna, for example. There is a coordinator there who has a network of faculty data stewards under him, and some of them are also trained in this course. Someone was again from the Open Science teams, someone from research support in general. But mostly, it was people who had data stewardship associated with their current jobs. They were from all over the world, including the US and Japan.
“I do see positively that it's moving forward, new courses are opening up, there's a community of data stewards started by Matyas Hiřman from the UK, it's moving forward.”
Martin
That's still a very live question. Indeed, in 2021, we have not yet been told what a data steward is in the Czech Republic and what it should do. Its role and what was needed from it was probably more or less clear, but the position as such was not yet anchored. Everything became clearer thanks to the creation of EOSC and the need to organize in some way. I don't even dare to guess how many data stewards would be needed. But every institution or university should have someone who scientists can turn to and who is able to point them in the right direction. That should be the minimum. Optimally, but this will probably happen many, many years from now, each research group will have a team data steward who is educated and knows what to do and how to do it.
“Every institution or university should have someone who scientists can turn to and who is able to point them in the right direction.”
Martin
We have an institutional data steward and four faculty data stewards. They're not focused on a specific faculty yet, but they exist. As for Charles University, there is an Open Science Centre under the library, where Matyáš Hiřman and Dagmar Hanzlíková work. I perceive them as very experienced people who have the ability to steer the university somewhere. This is gradually growing in the faculties. I know that the Faculty of Science has a new data steward, and the Faculty of Medicine is looking for someone. It's going to expand because there's a new Open Science grant coming that will support these positions.
Tereza
If we were to focus strictly on the official university data steward position, we have a net zero. We have a research data stewardship methodologist in the science and research department in the chancellor's office. That's me. And we also have an Open Science Coordinator in the library. As far as faculty data stewards go, we only have one at CATRIN, the university institute. But the research teams are handling the data somehow, we're not entirely in a vacuum. Where there is a need for it, some people perform this position without calling themselves data stewards. And then we have projects running that already have that position. In general, there are several models of what this might look like at institutions in the future. For example, there can be a centralized university service that different teams provide to researchers regarding research data stewardship. For example, our project partners in Helsinki have this. Or there can be a central team for Open Science and research data management like in the UK. Or an interconnected network of data stewards like in Vienna. It depends on how much each institution wants to get involved and how they approach it. It's undoubtedly one of the hottest topics right now and the goal is to make research data management part of common practice. However, I still see it as more of a team effort because data stewardship is not self-sufficient.
Martin
You often have to do terrible magic with how much money you have for the position and whether the money will come from a grant or the university can allocate the funding. At the university, we put together one full-time position, and it was split among several people. And with a single grant, the money often doesn't come out to dedicate the data steward as much as it needs to. It's something we need to play with in the future and set up somehow. Some of the work may be grant-funded, some may be covered by faculty data stewards.
“It's undoubtedly one of the hottest topics right now and the goal is to make research data management part of common practice. However, I still see it as more of a team effort because data stewardship is not self-sufficient.”
Martin
It is very bumpy. There's a period when nothing happens, and then there's a period when we're reviewing a huge amount of data management plan, or we need to do training because a new grant call has come out. It's hard to estimate. I have the advantage of being time flexible, and I can say that I'm going to spend three days in a row now just on the data management plan, for example because the deadline is coming up. Let's face it, scientists are not the most organized people and like to do everything at the last minute. And it also depends on how much education is needed. The more we educate, the better. Things are constantly evolving in Open Science, but many of the things that we set out to make work might work in ten, twenty years.
Tereza
That's right.
“Things are constantly evolving in Open Science, but many of the things that we set out to make work might work in ten, twenty years.”
Martin
Who knows. I'm not sure. It also depends on the enthusiasm of the scientists and how we manage to present it to them. The way I see it now is that the European Union is pushing us that we have to. And because we have to, we need more space to show the motivation why I should want it as a scientist, what it will bring me. It's more about now I have to do this paper, and I have a paper, and that's it.
Tereza
It's true, there are a few enthusiasts who are into it and they don't care if it's called Open Science or Responsible Science, it's just part of their almost daily practice. Then there's a group of people who need to be more up to speed on what's happening because there's not that much need in their field yet. And then there are the people who often encounter that need through funders. That's where it started to focus a lot on the Data Management Plan, which is in quotes one document, and that's often seen as additional bureaucracy and additional burden. As a scientist, you're already dealing with many things and now this comes up again. It takes time to explain that it makes sense. It's different than cleaning up your apartment and seeing the result. It's more long-term. The benefits come more slowly. Someone tells you: You'll save time. Okay, but I have to make the plan, and it's taking time away from me. So finding the motivation so that it's not perceived as, sorry, a chore, will take time. Ultimately, it's about changing the mindset. As Martin said, this is going to be really long-term.
Tereza
I wouldn't call it a problem, but it is more of a challenge. Even when you say "data", everybody may have a different idea about it. We can have a lot of discussions about them and at the end of the day we can find that we all mean something completely different by them. So, finding a common language and explaining the concepts are important. Another example is the term open science, which often comes up with research data management. However, openness and opening itself are just a slice of the whole governance process because before we can open the data, there are a lot of things that precede it. Moreover, openness scares some people. I talk about openness as a range from completely open to completely closed data with an open metadata record. Not all data can be opened for legitimate reasons, and that's fine. Maybe this is what makes communication between different groups a bit difficult. And hand on heart, if a new data steward comes in and says, "Yeah, you're doing it right, but you can do it better," I probably wouldn't be too thrilled either. Plus, understanding the whole context of why it's happening is certainly not enough to sit through one lecture. As Dagmar Hanzlíková from the UK says: "It's not a revolution, it's an evolution, and it just takes time."
Martin
Ultimately, the scientist knows best what data he has and how it should be handled. The challenge for us is how best to tell him the possibilities and what it can do for him. And at least I'm not always as enthusiastic about communicating that as I would like. So that's where I see the challenge, too.
Tereza
I am still figuring out a way to talk about open science in an interesting or even entertaining way. I always see the mood drop when the subject is brought up :)
“Ultimately, the scientist knows best what data he has and how it should be handled. The challenge for us is how best to tell him the possibilities and what it can do for him. And at least I'm not always as enthusiastic about communicating that as I would like. So that's where I see the challenge, too.”
Martin
It's not wrong to be cautious, it's perfectly fine to give everything a lot of consideration. It might help to stress that everything is very individual from branch to branch. In medicine, for example, there is often nothing that can be done about it from an ethical point of view. In other places, one can be very open and even ask the audience to contribute data or suggestions. There's not a blanket rule that everyone has to follow equally, but it's about setting up a process for each project so that it makes sense and does something.
Tereza
It's one thing to talk about it definitely. Imagine the journey of a scientist who after a while gets Horizon Europe and suddenly all these rules are thrown at them about what to do. It's not that he has all the data in a bad state, but suddenly, an apparatus above him defines his path in a certain way. So it's important to talk about it and know why. Because it's not, and now excuse the word, some kind of fagging from above, but there are real reasons behind why it's happening, why it's a mandatory part, for example. And the second important thing is support. When, for example, the conditions of providers or policies at different levels change, scientists must be helped. It is impossible for a scientist to do his research and be an expert in everything related to research data management.
“And the second important thing is support. When, for example, the conditions of providers or policies at different levels change, scientists must be helped. It is impossible for a scientist to do his research and be an expert in everything related to research data management.”
Martin
For me, communication is important, but more in the sense of openness. If he's motivated and wants to, that's more important than any other prerequisites. Of course, it helps if he has been part of a project or team in the past. The data stewards I know come from a variety of positions. I'm a researcher, sometimes it's a librarian, and they're all successful. Openness and interest may be enough.
Tereza
I'll add two more thoughts. If we were talking about the person's personality, they must also have some form of sensitivity and empathy. He can be extra enthusiastic, which is great, but that can also make scientists somewhat scared. Change won't happen overnight, we're talking years. And secondly, it depends on where the data steward operates in the institution's structure. Generally speaking, the closer they are to the headquarters, the more diverse the background can be. It could be from a library, it could be from research, it could be a PhD student or someone just out of their PhD. But the closer he is to the research team, the greater and closer his knowledge of the team and the field should be. Because he is already helping practically, and he understands those people much better.
Tereza
Yeah, that's right.
Martin
The closer he is to the team position, the more he works with data. And the higher up he is, the more he sets how it should optimally happen.
Tereza
Part of the job is consulting or training.
Martin
And that's where empathy is very important because, as scientists, we are stubborn, and we don't like to admit that we are not doing something the way we should. Sometimes you have to have patience.
“The closer he is to the team position, the more he works with data. And the higher up he is, the more he sets how it should optimally happen.”
Martin
Yes, somewhere, scientists may already be forced by norms and rules on how to handle data, so they do. For example, I come to the microscope and generate an image, and I have to store it somewhere and name it somehow. So somewhere, scientists are already doing that; maybe they don't even know they're doing it. And somewhere, it's the other way around, and nobody has told the scientist that if they're naming "data 1", "data 2", or "data 3", they might have a problem with it in a year.
Martin
Let's first agree on the tasks he has to perform, and from there we can approximate what his average day will look like. It will be important for a data steward at the institutional level to set and consult institution-wide rules and standards. So he will have to meet with the head of the IT department, with the ethics committee, and these will be impactful tasks that he will update regularly. A big part of the workload will be consultation, and I imagine that may take 40 to 50% of the time. The more junior the position, the more hands-on work with data will increase. For the institutional, I imagine he will check that the data is published correctly, that the data set is attached to the paper and vice versa.
Tereza
Definitely still monitoring needs. The data steward needs to know what's missing, where, and what the needs are and address those in order of priority. By the fact that we are still at the beginning, self-education is also key. Know the trends. Know the resources that need to be sorted. You can't give scientists fifty manuals, they'll eat you alive. You have to do your own research. And, of course, educating others. Preparing a good course for others takes a lot of time.
Martin
Admittedly, I forgot my favorite activity, and that is just surveying needs! Something that has worked well for us in terms of course planning is something I have adopted from abroad. We call it breakfast with the data stewards. Every once in a while, we make coffee and invite people to bake cookies, bring whatever they need, and maybe during class, we talk about whatever they've come across that they need help with. It always turns into a topic we'll discuss next time and offer solutions. Knowing what people really need is one of the most important things to me.
Tereza
And it would help if you still made yourself known. Having an institutional data steward is nice, but it's useless if no one knows about it.
“Knowing what people really need is one of the most important things to me.
And it would help if you still made yourself known. Having an institutional data steward is nice, but it's useless if no one knows about it.”
Martin
I haven't researched it in detail, but we have a general idea about each other. If I need to ask a question, I usually get an answer. Now, there was also a physical community meeting for the first time, and we weren't actually there. There was so much interest, it filled up so quickly, it just underscores how important this activity is.
Tereza
The community operates mostly virtually, but there are just these accompanying activities. I don't know everybody either, but I'm aware of the members because I occasionally do things with them.
Martin
I personally encounter more data storage in the private sector.
Tereza
We are turning into a data-driven society; data is important. This issue extends beyond universities and research institutes. I also expect collaboration in the private sector. A nice example is the project that is running here at UPOL, which works with the city of Olomouc and the Olomouc Region. They are addressing good practice on how to handle data and what data to open. So, the public sector is already involved.
“We are turning into a data-driven society; data is important. This issue extends beyond universities and research institutes.”
Tereza
This is the main idea of open science. It is going across sectors, not just staying in the university environment. It's definitely collaborative, but with a little red flag - don't open up all the data at all costs. Share, but share responsibly.
Martin
In terms of science, it's a great motivation to move on faster. I do data science, and if I want to develop an algorithm or a procedure to evaluate it better, I can't do it without data. I can have all the best ideas, but if I don't have access to the data openly, I have to search hard to find who wants to work with me and lend me the data even to start my work. This way, I can look in the repository, and if the data is well described, I can access it immediately. In the same way, I can compare myself with others in terms of how good I am at creating. And the commercial sector can do the same in the future.
Tereza
One more thing about openness - the data itself doesn't always have to be open. But just because it's stored in a repository and traceable through metadata, and I know that it exists and that I can contact a particular person, ask for it, and get access to it, for example, if certain conditions are met, is meaningful. There are different ways to share data.
Martin
Now, if I put myself in the position of the scientist who spends time and energy and goes through all the trouble to collect the data, I'm sure I'll want to show somewhere that I've done that. Even if I put the data closed in the repository, there will be a record somewhere that I actually worked on this and produced something.
Martin
That's right.
works as a research and data management methodologist at UPOL. She also works as a data manager in two research projects. She successfully completed the data steward course in Vienna in the academic year 2023/24.
Foto: UPOL
works as an institutional data steward and researcher in Data Science at the University of Technology in Prague. There he also runs the DocEnahnce Data Stewardship course, which Martin completed alongside the FAIRsFAIR Train The Trainer course in 2021. He also uses his experience within the National Technical Library and the Open Access Microscopy Core Facility at the Faculty of Science, UK.
Photo: Aleš Balda, VŠCHT
The European Big Data Value Forum 2024 took place in Hungary's capital from 2 to 4 October. This main event for Europe's research and innovation community in the fields of big data and artificial intelligence is traditionally organised by the Big Data Value Association. Representatives from the EOSC CZ initiative were also present at this year's conference and they will share their impressions with you.
Look at the view of a budding data steward, Kateřina Zvoníková, who attended the September meeting of the data steward community at Charles University. The meeting provided an opportunity to exchange experiences and new inspirations in data management. Find out what surprised her at this meeting and why she is excited about future possibilities for data support development!