image

Youxian Tao is currently a master degree candidate at School of Information, Renmin University of China. He received his B.S. degrees in computer science and technology from the Hohai University in 2013. His current research interests are Adaptive Layout Optimization and OLAP system. He and his team members developed a data layout optimization framework, which helps improve the I/O performance of wide tables stored in columnar formats on HDFS and published a paper on ICDE 2018. Youxian Tao is not a pure programming guy - besides writing codes, he also enjoys cooking and travelling.

  • Chinese CV of Youxian Tao.

  • Research

    2020.01 - 2020.04 Layout Optimization Based on Deep Reinforcement Learning

    This research will open source soon.

    2017.12 - Present Adaptive Hybrid Storage Optimization for Wide Tables Analysis

    This research takes a break for some reasons.

    2017.10 - 2018.08 Indexing and Ingestion for Real-time Analytics of ID-associated Data

    Many types of log data such as Internet access logs grow rapidly. How to index and ingest large amount of data in a few seconds and support real-time query evaluation on the data becomes a big challenge. By taking lessons from LSM-Tree, which is widely used in key-value stores, we proposed a lightweight and powerful log data indexing scheme which can ingest 500+ million tuples per node per second and support real-time lookups on multiple indexed dimensions.

    2019.09 - 2019.10 Adaptive Data Layout Optimization of Very Large and Wide Tables

    In this ongoing work, emerging hardwares and more extensive workloads are considered in data layout optimizations of very large and wide tables. With an adaptive framework, query workloads will benefit from a scalable and self-tuning storage layout in HDFS.

    Projects

    Pixels

    - A flexible column storage format with adaptive optimization techniques embedded.
    This project will open source soon.

    Paraflow

    - A real-time analytical system for ID-associated data.
    Paraflow enables users to load data into data warehouse (like HDFS) as soon as possible, and provides real-time analysis over data of being loaded and in the warehouse.

    • Fast loading. Paraflow utilizes a well-designed pipeline for efficient data loading.
    • No loss staging. Kafka is used in the system to stage data without losses.
    • Real-time analysis. Lightweight indices are used in Paraflow to speed up queries.

    Rainbow

    - A data layout optimization framework for wide tables stored on HDFS.
    Rainbow is an ETL tool which ADAPTIVELY improve the I/O performance HDFS column stores by reducing the disk seek costs. User can interact with Rainbow to monitor the optimization process in an ETL pipeline.

    Internship

    Intern Development Engineer

    Ant Financial | 2019.05 - Present
    • Ant BlockChain Group.
    • Mentor: Ying Yan
    • Main Works:
      • Mycloak: Release iteration of trusted computing plugin, automated testing process.
      • Mykms: Multiparty key management, key private deployment, Admin management platform.

    Big Data R & D Intern

    ByteDance | 2019.04 - 2019.05
    • Data Platform [Big data query and analysis].
    • Mentor: Dongdong Guo
    • Main Works:
      • Responsible for ClickHouse service.
      • Update and iterative function development of ETL tool.
      • Meet BI query needs, oncall.

    Software Development Intern

    ZhuiCan Technology | 2016.07 - 2016.08
    • Technology Department.
    • Mentor: Ming Li
    • Main Works:
      • Participate in the "cloud business intelligence prototype system" project.
      • Frontend: data visualization display.
      • Backend: the development of WeChat API interface.

    Publications

    ICDE'18 Demo Rainbow: Adaptive Layout Optimization for Wide Tables.

    - International Conference on Data Engineering. (CCF A)
    - Haoqiong BIAN, Youxian TAO, Guodong JIN, Yueguo CHEN, Xiongpai QIN, Xiaoyong DU.
    - paper poster video code

    MICROPROCESSORS'17 Design of Information Reporting System for Residential District Based on WeChat Public Platform (Chinese)

    - MICROPROCESSORS.
    - Tao Youxian, Wu Longying, Bai Hongxi, Mou Yan.
    - paper poster scene

    Contests

    Programming The First PolarDB Database Performance Competition

    Alibaba Tianchi | 2018.12

    Designint and implementing "Blockchain-based medical record service applet", and realize user privacy protection through encryption and decryption mechanism.

    Innovative 2018 Alibaba Cloud Global Blockchain Competition

    Alibaba Tianchi | 2018.07

    With Range as the core function, designing storage solutions, separating Key-Value data, dividing multiple DB fragments, and doing full data scanning through the producer-consumer model.