Cite this as
Yuan Q, Yang Z, Xiao Y (2024) Causal inference of Seoul bike sharing demand. Comput Math Appl. 2(1): 005-009. DOI: 10.17352/cma.000005Copyright Licence
© 2024 Yuan Q, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.The global surge in environmental consciousness has significantly boosted the demand for rental bikes, particularly in metropolitan areas such as Seoul. This study delves into the causal relationships affecting this demand using a dataset from Seoul’s bike-sharing system. Unlike previous research focusing predominantly on predictive analytics, this work innovatively applies multiple linear regression models to uncover causal inferences, offering insights that extend beyond mere forecasting. The challenges addressed include dealing with non-linear relationships and heteroscedasticity by employing the logarithmic transformation of rental counts. This approach not only aids in normalizing the data but also enhances the interpretability of the regression outcomes, emphasizing the changes in demand as a function of various environmental and temporal variables. Recent developments in causal inference methodologies have allowed for more robust and detailed analysis, paving the way for this study’s contribution to the field. The findings underscore the significant influence of factors such as hour of the day, humidity, and seasonal changes on bike rental volumes, which can inform policy-making and operational strategies in urban transport planning.
In recent years, the demand for rental bikes has been steadily increasing in metropolitan areas worldwide, driven by a growing global trend towards environmental protection and sustainable transportation [1-3]. Bike-sharing systems offer a convenient and eco-friendly alternative to traditional modes of urban mobility, allowing users to rent bicycles for short trips and return them to designated docking stations [4,5]. However, providing cities with a stable supply of rental bikes to meet the fluctuating demand has become a major challenge for bike-sharing operators [6,7]. Understanding the factors that influence bike rental demand is crucial for optimizing fleet management, improving user satisfaction, and promoting sustainable urban mobility [8-10]. While rental bikes serve as a key component of urban mobility, it is important to consider alternative options such as public transportation, private vehicles, walking, and other micro-mobility solutions like scooters [11-13]. Despite the presence of these alternatives, bike sharing remains a dominant force in the realm of sustainable transportation [14,15].
The topic of bike sharing demand has attracted significant attention from researchers in recent years [16-18]. Numerous studies have explored various aspects of bike-sharing systems, including demand prediction [19,20], user behavior analysis [21,22], and system optimization [23,24]. These studies have employed a wide range of methodologies, such as linear regression [25], time series analysis [26], and machine learning techniques like neural networks [27,28]. However, the majority of these works focus primarily on accurate demand prediction rather than causal inference [29,30]. While accurate prediction is undoubtedly valuable for operational planning, understanding the causal relationships behind bike rental demand is crucial for designing effective interventions and policies to encourage sustainable transportation.
Causal inference is a statistical approach that aims to identify the true causal effects of variables on an outcome of interest, going beyond mere correlations [9,10]. In the context of bike sharing demand, causal inference can help uncover the factors that directly influence user behavior and rental patterns, such as weather conditions, time of day, or bike infrastructure [3,15,20]. Several studies have applied causal inference techniques to investigate bike-sharing demand. For example, [11] used a difference-in-differences approach to evaluate the impact of a policy change on bike rental demand, while [12] employed a regression discontinuity design to estimate the effect of weather on bike usage. However, there remains a need for more comprehensive studies that apply causal inference methods to large-scale bike-sharing datasets, considering a wide range of potential causal factors [29,30].
To address this gap, this paper uses a dataset of Seoul bike-sharing demand and attempts to identify the key factors that contribute to the demand for rental bikes. By employing multiple linear regression models and analyzing the causal relationships between various independent variables and bike rental demand, this study aims to provide valuable insights for policymakers and bike-sharing operators. The findings can inform strategies to optimize bike fleet management, improve user experience, and promote sustainable urban mobility in Seoul and beyond [6,8,13,24].
The dataset used in this paper is from the UCI Machine Learning Repository [31], which records the number of rental bikes in Seoul every hour from December 1, 2017, at 0:00 to November 30, 2018, at 23:00, containing a total of 8,465 observations. Table 1 lists the variables in the dataset and their descriptions.
Among these variables, we do not use the “Functioning Day” variable because when the rental station is closed, the number of rental bikes is 0. Therefore, we deleted 295 observations where “Functioning Day” is “No”.
We employed multiple linear regression models to make causal inferences about the factors influencing bike rental demand [32]. The key assumption behind this approach is that the regression models can adequately capture the causal relationships between the independent variables and the dependent variable by controlling for multiple potential confounding factors simultaneously. By estimating the coefficients of the independent variables and assessing their statistical and economic significance, the models aim to identify the factors that have a causal impact on bike rental demand.
We used the natural logarithm of bike rental counts as the dependent variable in the regression models. We made this choice based on the following reasons: First, taking the natural logarithm can help transform a potential nonlinear relationship between bike rental counts and the influencing factors into a linear one, making the linear regression model more applicable. Second, it can reduce heteroscedasticity, which occurs when the conditional variance of the dependent variable varies with the levels of the independent variables. Third, taking the natural logarithm can improve the normality of the residuals, as the distribution of bike rental counts may be right-skewed. Fourth, when the dependent variable is log-transformed, the interpretation of the coefficients becomes more intuitive, representing the percentage change in bike rental counts for a one-unit change in the independent variable. Finally, it can reduce the differences in scales among variables, making the coefficients more comparable.
We considered several multiple linear regression models. We used the natural logarithm of bike rental counts as the dependent variable; the independent variables are Hour, Temperature, Humidity, Wind speed, Visibility, Dew point temperature, Solar Radiation, Rainfall, and Snowfall. The values of these variables are numeric. For the Season and Holiday variables, they can be considered as random samples across multiple periods. Therefore, we also introduced 3 dummy variables for the season (Spring, Summer, and Autumn) and 1 dummy variable for holiday. Table 2 lists the variables used in the models.
The following 4 models are considered
Model (1): Using Hour, Temperature, and Humidity as independent variables.
Model (2): Using Hour, Temperature, Humidity, Wind speed, Visibility, and Dew point temperature as independent variables.
Model (3): Adding Solar Radiation, Rainfall, and Snowfall as independent variables.
Model (4): Including all dummy variables.
Table 3 shows the results obtained using R software. The numbers outside the parentheses are the estimated coefficient values; the numbers in parentheses are standard errors. Estimates with ** are statistically significant at the 1% level, and those with * are statistically significant at the 5% level.
The analysis of variables is shown in Table 4.
Interpretation:
In conclusion, this paper investigates the factors influencing the demand for rental bikes in Seoul using a dataset of Seoul bike sharing demand. By employing multiple linear regression models and analyzing the statistical and economic significance of the estimated coefficients, this study identifies several key factors that have a causal impact on bike rental demand, such as Hour, Humidity, Dew point temperature, Rainfall, and dummy variables for Season and Holiday. The results suggest that these factors play a crucial role in determining the demand for rental bikes in Seoul. Furthermore, the paper highlights the importance of considering causal relationships rather than solely focusing on prediction accuracy when analyzing bike-sharing demand.
The methodology and findings of this study have potential applications beyond Seoul. Bike-sharing programs are becoming increasingly popular in cities around the world as a sustainable mode of transportation. Future research could apply similar causal inference techniques to analyze bike-sharing demand in other regions and countries, taking into account local contextual factors. This could provide valuable insights for policymakers and bike-sharing operators looking to optimize their systems and promote sustainable urban mobility.
The insights gained from this study can serve as a foundation for further research and policy decisions aimed at enhancing bike-sharing systems and encouraging sustainable transportation. By understanding the key factors that influence bike rental demand, policymakers and operators can develop targeted strategies to improve system efficiency, user satisfaction, and overall ridership.
Moreover, the causal inference approach employed in this study can be extended to investigate the impact of other potential factors on bike sharing demand, such as the built environment, public transit integration, or socio-economic characteristics of users. As cities continue to grapple with the challenges of congestion, air pollution, and climate change, bike sharing offers a promising solution for promoting active, low-carbon mobility. By leveraging the insights from this research, cities can create more resilient and adaptable bike-sharing systems that contribute to the broader goals of sustainable development.
Authors’ contributions: Quan Yuan conceived the idea and performed the programming; Zhixin Yang and Yayuan Xiao contributed to the writing and revision of the manuscript.
The authors would like to express their gratitude to the reviewers and editors for their valuable comments and suggestions, which have greatly improved the quality of this manuscript.
Subscribe to our articles alerts and stay tuned.
PTZ: We're glad you're here. Please click "create a new query" if you are a new visitor to our website and need further information from us.
If you are already a member of our network and need to keep track of any developments regarding a question you have already submitted, click "take me to my Query."