EngineConn Metrics reporting feature

    1.2 Goals

    • Added RPC protocol containing resources, progress, and additional information, supporting reporting of these information in one request
    • Reconstruct existing resources and progress reporting links, and combine the actions of reporting related information into one request

    2. Overall Design

    This requirement involves the modules. Add and refactor the reporting information in the computation-engineconn module, and parse the information and store it on the entry side.

    2.1 Technical Architecture

    The engine information reporting architecture is shown in the figure. After the user submits the task to the entry, the entry applies to the linkismanager for an engine. After applying to the engine, submit tasks to the application, and receive regular reports of tasks (resources, progress, status). Until the task is executed, the entry returns the final result when the user queries. For this modification, the engine metrics information needs to be added to the entry into the database; Combine Resource and Progress interface information in Orchestrator, and add additional information such as metrics; On the ComputationEngineConn side of the interactive engine, the reported resources and progress information are combined, and engine statistics are additionally reported.

    Core execution flow

    -[input] The input is the interactive engine computation-engineconn. When the engine executes a task, it reports the running information TaskRunningInfo, including the original TaskProgressInfo and , and adds the engine example information and the information about the number of existing tasks of the engine.

    engineconn-mitrics-2.png

    4. Data structure

    RPC protocol TaskRunningInfo has been added to the requirement, no db table has been added

    6. Non-functional design:

    6.1 Security

    RPC interface internal authentication, does not involve external security issues

    Combined two RPC interfaces to reduce the number of reports and improve performance

    6.3 Capacity

    Less metrics information, no impact

    6.4 High Availability