Uber operates the world's largest ride hailing network serving just under 1,000 metropolitan areas spread across 70 countries.
As companies like Uber continue to grow and expand their operations, so do the amounts of data and associated metadata that they produce on a daily basis. These innovative technology firms put a lot of focus on their data and strive to enable their growing data analytics teams to easily find the data that they need.
To facilitate internal data discovery, Uber built its own dataset search and management tool called Databook. A single interface into the company's metadata graph, Databook indexes hundreds of thousands of datasets, millions of columns and fields, and hundreds of thousands of other data entities such as dashboards and pipelines.
At a high level, Databook ingests metadata from various sources—primary data storages, services, and crawlers—and makes it accessible to end users via a unified search interface. Users can search for indexed data entities which are updated in real-time and view additional signals such as usage statistics and quality trends.
Since its launch in 2016, the Databook platform has changed significantly to provide better flexibility and extensibility as well as an improved user experience. Overall, "Databook 2.0" includes a number of improvements and helps users cut through the noise while allowing them to comb through every detail when necessary.
Centralized data catalogues are essential for companies that are faced with the ever increasing volumes of distributed and complex data. Uber's Databook provides a unified view of its data ecosystem and continues to evolve and grow with the company.
By powering scalable data discovery and exploration, Databook helps Uber better manage and utilize its own data assets and ensures the global success of the data-driven enterprise.
Made by Anton Vasetenkov.
If you want to say hi, you can reach me on LinkedIn or via email. If you like my work, you can support me by buying me a coffee.