How Apache DolphinScheduler Conquers Data Processing Challenges For Bosch Smart Driving
Speaker Introduction
Tao Chaoquan, a backend engineer at Bosch Smart Driving (China), is responsible for data processing and data scheduling tasks. He has extensive practical experience in the field of smart driving data processing. In December 2024, at the Apache DolphinScheduler community online exchange meeting, he shared application cases of Apache DolphinScheduler in smart driving data processing and its future development blueprint.
Business Background
Bosch Smart Driving (China) is part of the Bosch Group, fully named Robert Bosch GmbH established in 1886, with its global headquarters in Germany. It has more than 420,000 employees and is present in over 50 countries. Its business covers four major areas: automotive and intelligent transportation technology, industrial technology, consumer goods, and energy and building technology.
Today's topic will focus on the technical transformation and business application of Bosch in the field of smart driving technology based on Apache DolphinScheduler.
The development of smart driving technology is highly dependent on data. Data is not only the cornerstone of model training but also the key to functional verification. Smart driving models require a large amount of high-quality data for training to improve the accuracy of perception, decision-making, and control. At the same time, to ensure the reliability and safety of the system, real-world vehicle functional verification also requires a variety of test data.
Access Transformation
Before Access
Before using Apache DolphinScheduler, Bosch Smart Driving relied on Jenkins to implement workflow orchestration and scheduling within the business code. The advantage of this method is its high flexibility, allowing the definition of any form of workflow orchestration. However, the disadvantage is also obvious, which is a high degree of coupling with the business code. Any change in the workflow requires modification of the business code, increasing the complexity and risk of maintenance.
After Access
After the scheduling selection, Bosch Smart Driving decided to use Apache DolphinScheduler and carried out a series of access transformations based on version 3.2.0 to improve the efficiency and flexibility of data processing.
The following are the specific implementation plans for the series of transformations carried out by Bosch Smart Driving.
MQ Trigger
Based on the data source, Bosch Smart Driving increased the creation of message sources and bound the message sources with the workflow, achieving automatic triggering of the workflow. This improvement allows the workflow to respond more flexibly to changes in the data source.
Node Enhancement
Bosch Smart Driving heavily relies on K8S tasks and dynamic tasks for orchestration and has made some key transformations based on DolphinScheduler in this regard, including:
Main Process and Sub-process: Optimized the management of the main process and sub-processes.
Custom Plugin: Allowed custom plugins to meet specific business needs.
Modify the Sub-process Generation Rules of Dynamic Nodes: Adjusted the sub-process generation rules of dynamic nodes to better control parameter output.
Asynchronous Trigger & Polling: Implemented asynchronous triggering and polling mechanisms to improve task response speed.
Conditional Http: Introduced conditional HTTP requests to achieve more complex workflow logic.
Dynamic Priority
Bosch Smart Driving also implemented dynamic priority functionality based on Apache DolphinScheduler to meet the needs of different business scenarios and ensure that key tasks can be executed with priority.
Best Practices
Deployment Architecture
Bosch Smart Driving adopted K8S deployment to achieve isolation between control clusters and computing clusters. This isolation strategy includes:
Namespace Isolation: Through namespace-level isolation, logical separation between different tasks is achieved.
Node Isolation: Through node-level isolation, it ensures that computing tasks will not cause control nodes to be evicted due to resource competition or load.
Cluster Version
Bosch Smart Driving introduced the TTL Controller, which is a mechanism to control how long after a job ends it will be deleted. This feature officially came into effect from Kubernetes version v1.23. It should be noted that using older versions may lead to increased pressure on the Kubernetes cluster, and even cause Ds worker OOM (Out of Memory), so caution should be exercised when using it.
K8S Task Configuration
In terms of K8S task configuration, Bosch Smart Driving offers the following suggestions:
Task Parameter Passing: Avoid using large JSON for parameter passing, and try to use file interaction, using file addresses as parameters to reduce the burden of network transmission.
Resource Quota: For longer-lasting k8s tasks, try to configure the same request and limit to avoid resource overselling leading to OOM.
IO Control: For IO-intensive tasks, try to avoid a large amount of local disk read and write, and use CFS (Comprehensive File System) to reduce the impact on other tasks on the current node.
K8S Task Isolation & Dynamic Priority
Faced with the problem of different types of k8s tasks being scheduled to the same k8s cluster for execution, Bosch Smart Driving proposes the following solutions:
Support for Dynamic Modification of Task Priority on Master: Allows dynamic adjustment of task priority to meet different business needs.
Allocation of Different Types of Tasks to Different Nodes through Node Labels and Tolerance: In this way, it can ensure that different types of tasks are isolated in terms of resource usage while maintaining their respective priorities.
Future Planning
Finally, Bosch Smart Driving has expressed plans to implement new features and further improvements in the future, including the realization of task resource isolation and integration with CICD, to further enhance the efficiency and stability of smart driving data processing. These plans will help Bosch Smart Driving's technological progress and business development in the field of smart driving.
Conclusion
This sharing not only demonstrates the practical application of Apache DolphinScheduler in the smart driving data processing of Bosch, a century-old company but also provides valuable practical experience and future development directions. Welcome to learn about and join the Apache DolphinScheduler community to get more information and resources, and jointly promote the development of smart driving technology.