"If data were available so that researchers could pick up where others left off, it would certainly be the right path," says Roman Čampula

Roman Čampula works at the Transport Research Centre (CDV) as the head of the department of traffic behavior analysis and traffic modeling, handling traffic data. He welcomes the EOSC CZ initiative and envisions every research project ending with data being stored in a large national database. He sees a future in open science and admits that he and his team are already working with open data.

17 Jun 2024 Martina Čelišová

You work as the head of the department of traffic behavior analysis and traffic modeling. What exactly does your job entail, and how did you come to this position?

I've been working at CDV for ten years; I joined straight out of university as a programmer. Initially, I was more involved in research, processing data from colleagues conducting road surveys and data from toll gates on Czech motorways so the ministry could understand where people were traveling to and from and how many people were on the roads. I also focused on traffic modeling, which is a specialty not many people in the Czech Republic engage in. Now, as the head of our ten-member department, my role also involves work planning, budget preparation, and collaboration with cities.

Does your data and work significantly impact the functioning of the wider society?

Although we work in a research organization, we strive to transfer our research into practice for Czech and Slovak cities. We develop sustainable urban mobility plans, which are strategic documents that every city with over forty thousand inhabitants must have to access European and national funding. These plans outline how a city should function transport-wise over the next thirty years, detailing which modes of transport to support, how to reduce car traffic, and how to address parking or cycling.

Do cities approach you, or do you reach out to them?

We have a network of dozens of cities we collaborate with, and the initiative mainly comes from them. If they need to convert a regular intersection into a signalized one, they ask us how many cars pass through it, how many pedestrians cross it, and whether the location meets the standards for a signalized intersection. We measure and evaluate the site for them.

How are your results and research data accessible within CDV and outside CDV?

It depends on each researcher. My colleagues and I have established a system where everything we create and collect is stored in one place, ideally in a standardized format. This is fully accessible to our team and available when we start working with another city. Other CDV staff can also view the data, but they must know it exists. Sometimes we encounter the problem that someone is unaware of a survey we conducted and intends to replicate it.

Outside CDV, I'd divide it into two areas. For research projects, the collected data is usually accessible because these are grant-funded projects using public funds, so the final reports are published, or the project has its own website. For commercial projects, the results are not publicly available since the rights transfer to the client, and we must negotiate with them if we want to use the data elsewhere.

“Sometimes we encounter the problem that someone is unaware of a survey we conducted and intends to replicate it.”

Do you have someone responsible for managing the data?

We have several traffic areas, and each one regulates it in its own way. My team and I strive to maintain a shared data repository where the data is stored in a structured format on shared drives. Additionally, there are metadata describing what the data is, its structure, the year, the city or area it covers, its detail, whether it needs updating, who manages it, how it is stored, and in which versions. But this is not a rule at CDV; it's more about my colleagues and me ensuring we can always access our data.

Do you use external data, perhaps from abroad?

Rather than using numerical data from abroad, we utilize various data methodologies and guidelines from Austria or Germany. These are typically useful for sustainable mobility plans, which involve surveys on where people travel, for what purpose, and how often. So, there is inspiration from abroad from our closest neighbors, but data on the number of trips, for example, cannot be transferred here from abroad.

Can you imagine providing data to universities? Has anyone ever requested data from you?

It happens. Not very often, but it does. We usually provide data from our largest survey, which involved ten thousand Czech households, investigating how all household members travel. The results are freely available on the project’s website, so anyone can download them. There are thousands of rows of data in Excel. There are many parameters, so this research project may not be entirely understandable to the general public. Universities or cities typically approach us, asking for help explaining the data. We assist them in this regard.

“So, there is inspiration from abroad from our closest neighbors, but data on the number of trips, for example, cannot be transferred here from abroad.”

This is a good example of what is called open data. Do terms like open science or open access mean anything to you?

We would be very happy if the results and data we collect and process could be stored in a way that others could use them. If we didn't have to be the only ones using them, but others could too, it would be great. The software we develop is published for free or with open source code so that someone can continue our work. This isn't always possible due to licensing or commercial reasons, but for the research part, all outputs can be shared and stored in a public place without any problems. Even though we don't do this completely now, I don't see any problem in doing so in the future if someone motivated us.

Do you see any negatives in this?

It's essential to describe in detail how the data was collected. What methodology was used, how it was processed, for what purpose, and what conclusions were drawn. Often, this description is missing. Sometimes we come across a dataset that looks interesting but lacks any background information. That is, for what purpose and for how long the data was collected, what the sample population was, how exactly the data was evaluated, or how it was handled when someone didn't respond. Without this awareness, it's challenging to build upon the data. Ideally, data should include the most detailed report on how it was created.

“The software we develop is published for free or with open source code so that someone can continue our work.”

A national data repository is being established. Can you imagine being part of it and contributing to it?

Certainly. If someone told us in what format the data should be stored, we would convert it and store it. Other cities could use it and draw inspiration from each other. As a research organization, we could use data collected by someone else in another city where we haven't worked. Various comparisons could lead to an interesting competition among cities, such as how successful they are in motivating motorists to switch to other forms of transport. It seems to me that it is not widely published how things are in Czech cities. Cities prepare projects for themselves, but no one else gets access to the results. If data and analysis results were available somewhere, it wouldn't always be necessary to start from scratch; we could build on existing work. Sometimes we see that some work is being done multiple times, even though it wouldn't be necessary. In transport operations, there are many differences, but there are also similarities, and these are repeatedly investigated. Public funds are repeatedly spent on completing a specific analysis because it is impossible to obtain data from a single source.

That’s why the mentioned nationwide transport behavior survey was created. These are public data that individual processors can build on, whether they are from the private sector, research, or public administration. And it's a good path for publication in scientific journals. I've encountered this several times. We had a discussion about whether to pay for open access because the processor often has to pay to make the article freely available. When I, as a researcher, come up with something, I want to publish it to the widest possible audience, possibly even abroad. And if someone has to pay for my outputs, perhaps through a journal subscription, it significantly limits the reach. And I can't access many research activities or articles because someone decided not to subscribe to that particular journal. If as much as possible were available in an open form, that would be great.

“Public funds are repeatedly spent on completing a specific analysis because it is impossible to obtain data from a single source.”

EOSC aims to help scientists with data storage. Do you think it is necessary for scientists and researchers to have this service?

I find this initiative very sympathetic in general, and if there were some awareness-raising in this regard, it would be a good thing. If someone gave us understandable instructions on the existence of a public data repository and the conditions for storing data there, we would certainly take it as part of our scientific work. Every project would simply end with our outputs being stored somewhere to serve other researchers. I would really like that.

Ing. Roman Čampula

works at the Centre for Transport Research as head of the Traffic Behaviour Analysis and Modelling area, processing traffic data. He welcomes the EOSC CZ initiative and can imagine that any research would end up in a large national database. He sees a future in open science and admits that he already works with open data at least in his team of colleagues.

All articles