Skip to content

Data Sync

Overview

The Data Sync module is responsible for synchronizing data from data sources to content repositories. It handles the maintenance of data sources and bidirectional synchronization between data sources and content repositories. Planned data sources include:

  • Local data sources (Excel files will be supported first)
  • Unreal Engine data sources
  • Google Sheets (not currently in development)
  • Database (not currently in development)
  • Git Repository (not currently in development)
  • Message notification system
  • File service
  • Business database
  • Content database
  • Message queue

Data Source Maintenance and Synchronization

A data source is associated with one Repository, and the content of the data source and Repository can theoretically be converted to each other under certain conditions. For local data sources, synchronization from Repository to data source can be understood as file export (file restoration) in localization. For game engines, since the game engine is an internal environment, the game engine needs to push content to a buffer area first, and then import it from the buffer area to the Repository. Therefore, for game engines, a unique ID needs to be configured in the game engine so that the system can identify the ID when the game engine pushes data.

Local Data Sources

For local data sources, first create a Data Source in the Repository, define the type as local file, specify the file type (currently only Excel is supported), upload the file, set the definitions of the columns in the Excel file that need to be imported, the import order of the files, and other information. Then the service will first check for potential problems with the current file under the current import rules. After the user ignores or adjusts these issues, the data source service will start the synchronization task. Each task will generate a corresponding import report. Synchronization tasks support rollback, and after rollback, the imported content will be deleted from the content repository.

Data Source and Mapping Validation Rules

After the user uploads the file and sets up the mapping, the service will perform validation according to the following rules:

  • The Key column specified in the rows of the file cannot have empty values (rows with empty values will be ignored and not imported)
  • The Key column specified in the rows of the file cannot have duplicate values (only the first occurrence of a Key will be imported, others will be ignored)
  • The Source column specified in the rows of the file cannot have empty values (rows with empty values will be ignored and not imported)

Synchronization Behaviors

The two ends of a data source, i.e., upstream and downstream, where upstream is a file collection and downstream is a Repository, have two types of synchronization behaviors:

  • Full synchronization: All files in the upstream file collection will be synchronized to the downstream In full synchronization, the processor will parse all files, then convert the content in the files to content in the Repository according to the definitions in the files, and save it to the Repository. Each record in the Repository will record source information, including file identifier, sheet identifier, row number, string ID, and string content. Through the source information, it can be reverse imported into the upstream file collection.
  • Incremental synchronization: New and modified files in the upstream file collection will be synchronized to the downstream In incremental synchronization, the processor will parse all files, then compare them with the IDs of the content in the Repository. If a corresponding ID is found, the content is considered to exist. If the source text is also consistent, it will be skipped; if the source text is inconsistent, the content will be marked as updated; otherwise, the content is considered new. Then, according to the definitions in the files, the content in the files will be converted to content in the Repository and saved to the Repository, and source information will also be recorded.
  • Reverse synchronization: In reverse synchronization, the processor will traverse all content, find matching source information in the content repository, and complete the translations in the columns. If information other than the translation has been modified, it will also be updated back to the file.
  • Version management: Each synchronization task will generate a version, with the version number related to time. The content repository always operates on the latest version, but if import errors or misoperations occur, it can be rolled back to a previous version. The system will retain 3 versions.

Regular Export

Regular export refers to exporting a version or filtered content of a content repository in a system-defined format as CSV, regardless of the content repository data source.

Flow Chart

uml diagram

API Design

  1. Create Local Data Source API

    • Description: Create a local data source in the Repository, define the type as local file, and configure initial information.
    • Method: POST
    • URL: /api/v1/datasources
    • Request Example:
      json
      {
        "name": "Example Data Source",
        "type": "LOCAL_FILE",
        "repository_id": "repository_uuid",
        "config": {
          "file_type": "excel"
        }
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Data source created successfully",
        "data": {
          "datasource_id": "generated_uuid"
        }
      }
  2. File Association API

    • Description: Upload Excel files through the file service and associate file ID and other information with the data source.
    • Method: POST
    • URL: /api/v1/datasources/{datasourceId}/file
    • Request Example:
      json
      {
        "file_id": ["file_id1", "file_id2"]
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "File uploaded successfully"
      }
  3. Column Definition Configuration API

    • Description: Configure the definitions and mapping relationships of columns in Excel files for subsequent data conversion processing.
    • Method: PATCH
    • URL: /api/v1/datasources/{datasourceId}/columns
    • Request Example:
      json
      {
        "columns": [
          {
            "fileId": "file_id1",
            "sheetName": "sheet_name1",
            "column_name": "ENG",
            "data_type": ["source"]
          },
          {
            "fileId": "file_id1",
            "sheetName": "sheet_name1",
            "column_name": "CHINESE",
            "data_type": ["target", "zh-CN"]
          }
        ]
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Column definition configured successfully"
      }
  4. Sync Task Trigger API (Manual Sync)

    • Description: Manually trigger the synchronization task of the data source to start the data conversion and import process.
    • Method: POST
    • URL: /api/v1/datasources/{datasourceId}/sync
    • Request Example:
      json
      {
        "options": {
          "start_time": "2024-01-01 00:00:00",
        }
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Sync task triggered",
        "data": {
          "task_id": "sync_task_uuid"
        }
      }
  5. Sync Task Cancellation API

    • Description: Cancel a synchronization task that is in progress or not completed.
    • Method: POST
    • URL: /api/v1/datasources/{datasourceId}/sync/cancel
    • Request Example:
      json
      {
        "task_id": "sync_task_uuid"
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Sync task cancelled"
      }
  6. Sync Report Query API

    • Description: Query the synchronization report of a specified task to obtain the status and detailed report of the synchronization task.
    • Method: GET
    • URL: /api/v1/datasources/{taskId}/sync-report
    • Response Example:
      json
      {
        "code": 200,
        "message": "Sync report query successful",
        "data": {
          "reports": [
            {
              "status": "SUCCESS",  // Possible values: RUNNING, FAILED, SUCCESS
              "report": {}
            }
          ]
        }
      }

Game Engine Data Sources

Game engine data sources currently only support Unreal Engine. First, create a data source in the Repository, define the type as game engine, specify the game engine type (Unreal Engine), and then a synchronization ID will be automatically generated. The game engine configures this synchronization ID through a plugin, and in the plugin, you can select the localization strings that need to be pushed. Then, through the API interface provided by the system to the plugin, the localization strings are pushed to the system's buffer area. When the Repository administrator receives the push notification, they can preview the pushed strings by downloading them as CSV, and then manually synchronize them to the Repository. Similarly, synchronization actions support rollback. Due to the special nature of game engines, reverse synchronization back to the game engine is not currently supported. After localization is completed, the strings can be exported as CSV, JSON, or other structured files according to the user's needs.

Synchronization Behaviors

The two ends of a data source, i.e., upstream and downstream, where upstream is a Raw storage (unprocessed localization strings pushed by the game engine) and downstream is a Repository, have two types of synchronization behaviors: The synchronization behaviors can be compared to local file data sources. Essentially, because the game engine cannot be directly connected, the pushed data is sent to a storage area, and then the storage area is used as a data source for synchronization operations. This is essentially equivalent to transferring the authority to maintain the data source to the developer. The developer pushes to the storage area in full or incrementally, which is essentially the same logic as a Producer uploading Excel files to the data source.

Flow Chart

uml diagram

API Design

  1. Create Game Engine Data Source API

    • Description: Create a game engine data source in the Repository and configure initial information.
    • Method: POST
    • URL: /api/v1/datasources
    • Request Example:
      json
      {
        "name": "Example Game Engine Data Source",
        "type": "GAME_ENGINE",
        "repository_id": "repository_uuid",
        "config": {
          "engine_type": "unreal"
        }
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Data source created successfully",
        "data": {
          "datasource_id": "generated_uuid",
        }
      }
  2. Create API Token API

    • Description: Create an API Token for the data source to identify the push client and verify push requests from the game engine.
    • Method: POST
    • URL: /api/v1/datasources/{datasourceId}/access-token
    • Request Example:
      json
      {
        "name": "UE5-Project-A-Token",
        "description": "For Developer-01",
        "expire_time": "2024-01-01T00:00:00Z"  // Optional, never expires if not set
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Token created successfully",
        "data": {
          "id": "token_id",
          "token": "xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
          "name": "UE5-Project-A-Token",
          "status": "active",
          "created_at": "2024-01-01T00:00:00Z",
          "expire_time": "2024-01-01T00:00:00Z"
        }
      }
  3. Query Token List API

    • Description: Get all API Tokens under the data source.
    • Method: GET
    • URL: /api/v1/datasources/{datasourceId}/access-tokens
    • Response Example:
      json
      {
        "code": 200,
        "message": "Query successful",
        "data": {
          "tokens": [
            {
              "id": "token_id",
              "name": "UE5-Project-A-Token",
              "description": "For Developer-01",
              "status": "active",
              "last_used": "2024-01-01T00:00:00Z",
              "created_at": "2024-01-01T00:00:00Z",
              "expire_time": "2024-01-01T00:00:00Z"
            }
          ]
        }
      }
  4. Delete Token API

    • Description: Delete the specified API Token.
    • Method: DELETE
    • URL: /api/v1/datasources/{datasourceId}/access-tokens/
    • Response Example:
      json
      {
        "code": 200,
        "message": "Token deleted successfully"
      }
  5. Localization String Push API

    • Description: Provide localization string push service for game engines. Use API Token for authentication.
    • Method: POST
    • URL: /api/v1/datasources/{datasourceId}/push-localization
    • Headers:
      X-Access-Token: xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      Content-Type: application/json
    • Request Example:
      json
      {
        "localizations": [
          {
            "key": "welcome_message",
            "value": "Welcome to our game!"
          },
          {
            "key": "exit_message",
            "value": "Thank you for playing, goodbye!"
          }
        ],
        "config": {
          "type": "full", // full, incremental
          "push_time": "2024-01-01T00:00:00Z",
          "batch_id": "uuid-v4-string",  // Avoid duplicate submissions
          "other_config": {}
        }
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Localization strings pushed successfully",
        "data": {
          "batch_id": "uuid-v4-string",
        }
      }
  6. Sync Task Trigger API

    • Description: Manually trigger the synchronization task of the game engine data source to start the data synchronization process.
    • Method: POST
    • URL: /api/v1/datasources/{datasourceId}/sync
    • Request Example:
      json
      {
        "options": {
          "start_time": "2024-01-01 00:00:00",
        }
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Sync task triggered",
        "data": {
          "task_id": "sync_task_uuid"
        }
      }
  7. Sync Task Cancellation API

    • Description: Cancel a synchronization task that is in progress or not completed.
    • Method: POST
    • URL: /api/v1/datasources/{datasourceId}/sync/cancel
    • Request Example:
      json
      {
        "task_id": "sync_task_uuid"
      }
    • Response Example:
      json
      {
        "code": 200,
        "message": "Sync task cancelled"
      }
  8. Sync Report Query API

    • Description: Query the synchronization report of a specified task to obtain the status and detailed report of the synchronization task.
    • Method: GET
    • URL: /api/v1/datasources/{taskId}/sync-report
    • Response Example:
      json
      {
        "code": 200,
        "message": "Sync report query successful",
        "data": {
          "reports": [
            {
              "status": "SUCCESS",  // Possible values: RUNNING, FAILED, SUCCESS
              "report": {}
            }
          ]
        }
      }

Database Design

1. Data Source Table (data_sources)

sql
CREATE TABLE data_sources (
    id VARCHAR(36) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    type VARCHAR(20) NOT NULL,  -- LOCAL_FILE, GAME_ENGINE, etc.
    repository_id VARCHAR(36) NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'active',  -- active, inactive, deleted
    config JSON NOT NULL,  -- Store specific configurations for different types of data sources
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    created_by VARCHAR(36) NOT NULL,
    updated_by VARCHAR(36) NOT NULL
);

-- Data Source Access Token Table
CREATE TABLE data_source_tokens (
    id VARCHAR(36) PRIMARY KEY,
    data_source_id VARCHAR(36) NOT NULL,
    name VARCHAR(100) NOT NULL,
    token VARCHAR(255) NOT NULL,
    description TEXT,
    status VARCHAR(20) NOT NULL DEFAULT 'active',  -- active, inactive, expired
    last_used_at TIMESTAMP,
    expire_time TIMESTAMP,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(36) NOT NULL,
    FOREIGN KEY (data_source_id) REFERENCES data_sources(id)
);

-- Data Source File Mapping Table (for local file types)
CREATE TABLE data_source_files (
    id VARCHAR(36) PRIMARY KEY,
    data_source_id VARCHAR(36) NOT NULL,
    file_id VARCHAR(36) NOT NULL,
    file_ext VARCHAR(20) NOT NULL,  -- xlsx, csv, etc.
    config JSON NOT NULL,  -- Store column mapping and other configuration information
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    FOREIGN KEY (data_source_id) REFERENCES data_sources(id)
);

2. Sync Task Table (sync_tasks)

sql
CREATE TABLE sync_tasks (
    id VARCHAR(36) PRIMARY KEY,
    data_source_id VARCHAR(36) NOT NULL,
    type VARCHAR(20) NOT NULL,  -- full, incremental, reverse
    status VARCHAR(20) NOT NULL,  -- pending, running, completed, failed, cancelled
    version VARCHAR(36) NOT NULL,  -- Associated with version in MongoDB
    progress INT NOT NULL DEFAULT 0,
    error_message TEXT,
    config JSON NOT NULL,  -- Task configuration information
    stats JSON,  -- Statistical information
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(36) NOT NULL,
    FOREIGN KEY (data_source_id) REFERENCES data_sources(id)
);

-- Sync Task Log Table
CREATE TABLE sync_task_logs (
    id VARCHAR(36) PRIMARY KEY,
    task_id VARCHAR(36) NOT NULL,
    level VARCHAR(10) NOT NULL,  -- info, warning, error
    message TEXT NOT NULL,
    metadata JSON,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (task_id) REFERENCES sync_tasks(id)
);

3. Raw Storage Database (raw_storage)

sql
CREATE TABLE raw_storage (
    id VARCHAR(36) PRIMARY KEY,
    data_source_id VARCHAR(36) NOT NULL,
    batch_id VARCHAR(36) NOT NULL,
    type VARCHAR(20) NOT NULL,  -- full, incremental
    status VARCHAR(20) NOT NULL,  -- pending, processed, failed
    content JSON NOT NULL,  -- Raw data content
    metadata JSON,  -- Metadata information
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    processed_at TIMESTAMP,
    FOREIGN KEY (data_source_id) REFERENCES data_sources(id)
);

4. MongoDB Content Data Document Structure

javascript
// Content Version Collection (content_versions)
{
  _id: ObjectId(),
  repository_id: "uuid",
  version: 1,               // Numeric version number, starting from 1 and incrementing
  hash: "a1b2c3d",         // Optional: 7-digit short hash for quick reference
  status: "active",        // active, archived
  created_at: ISODate(),
  created_by: "user_id",
  metadata: {
    description: "Version description",
    tags: ["tag1", "tag2"]
  }
}

// Content Collection (contents)
{
  _id: ObjectId(),
  repository_id: "uuid",
  version_id: ObjectId(),  // Associated with version collection
  content_key: "app/name/key",  // Unique content identifier, key defined by user in file
  status: "new",  // new, updated, deleted
  source: {
    text: "Welcome to our game!",
    language: "en-US"
  },
  translations: [{
    language: "zh-CN",
    text: "欢迎使用我们的游戏!",
    status: "translated",  // draft, translated, reviewed
    updated_at: ISODate(),
    updated_by: "user_id"
  }],
  metadata: {
    source_type: "game_engine",  // local_file, game_engine
    source_info: {
      data_source_id: "uuid",
      file_id: "uuid",        // For local files
      sheet_name: "Sheet1",   // For Excel files
      row_number: 1,          // For Excel files
      batch_id: "uuid"        // For game engine pushes
    }
  },
  created_at: ISODate(),
  updated_at: ISODate(),
  created_by: "user_id",
  updated_by: "user_id"
}

// Content History Collection (content_history)
{
  _id: ObjectId(),
  content_id: ObjectId(),  // Associated with content collection
  version_id: ObjectId(),  // Associated with version collection
  type: "translation_update",  // content_create, translation_update, metadata_update
  changes: {
    before: {
      // Content before change
    },
    after: {
      // Content after change
    }
  },
  action: "user_edit",  // user_edit, sync_task
  created_at: ISODate(),
  created_by: "user_id",
}