Featured image of post Automatically Translate Blog into English Using Deepseek + Github Actions

Automatically Translate Blog into English Using Deepseek + Github Actions

Since ChatGPT 3.0, AI translation has swept the translation market with unprecedented accuracy. Since this blog once used English for writing, many articles have a mix of Chinese and English. Taking advantage of the upcoming price increase for Deepseek, I spent a few minutes translating all the articles on this blog into English. Additionally, by leveraging Github Actions, I achieved fully automated translation.

Pre-translation Preparation

  1. Confirm that your HUGO version and theme support i18n functionality

As long as your Hugo version isn’t particularly outdated (below 0.60.0), it should support this feature. Most mainstream Hugo themes also support i18n by default. You can check the theme’s yaml/toml file to see if there are multi-language configurations.

  1. Install Python

Download and install the Windows version from the Python official website.

  1. Install Necessary Dependencies

Use a command-line tool like Windows Terminal or Git Bash to install the requests and openai libraries. The former is used for sending HTTP requests, and the latter is for configuring the Deepseek API (Deepseek uses the standard OpenAI format).

1
2
pip install requests
pip install openai
  1. Apply for Deepseek API

Go to the Deepseek Open Platform to create an API KEY. Make sure to copy and save it when creating it. (Each new phone number registration comes with a 10 yuan balance. If not, you can top up 10 yuan yourself.)

Write and Run the Python Script

You can directly use my script. Save the following code as a translate.py file, then run the command-line tool in the directory where the py file is saved, and enter python translate.py to start the automatic translation. Usage instructions:

  1. Replace “sk-xxx” with your own Deepseek API KEY.
  2. Replace “D:/hugo/lawtee/content/” with your own Hugo article directory.
  3. Fixed translation mappings mainly solve potential semantic differences in article categories between Chinese and English. Use them according to your actual situation.
  4. MAX_TOKENS can be modified or left as is. If your articles are generally short, you can reduce it.
  5. The translation logic here is to traverse all folders in the Hugo content directory. For folders containing only index.md, translate it into index.en.md. If both index.md and index.en.md already exist, no translation is performed.
View Code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import os
import re
from openai import OpenAI

# Configuration
API_KEY = "sk-xxx"  # Replace with your DeepSeek API key
CONTENT_DIR = "D:/hugo/lawtee/content/posts"  # Path to Hugo's blog post directory
MAX_TOKENS = 8192  # API's maximum token limit
CHUNK_SIZE = 6000  # Maximum token size per chunk (adjust based on actual situation)

# Fixed translation mappings
CATEGORY_TRANSLATIONS = {
    "生活": "Life",
    "法律": "Law",
    "社会": "Society"
}

# Initialize OpenAI client
client = OpenAI(api_key=API_KEY, base_url="https://api.deepseek.com")

# Translation function
def translate_text(text):
    response = client.chat.completions.create(
        model="deepseek-chat",
        messages=[
            {"role": "system", "content": "You are a helpful translator. Translate the following text into English."},
            {"role": "user", "content": text},
        ],
        max_tokens=MAX_TOKENS,  # Specify maximum output length
        stream=False
    )
    return response.choices[0].message.content

# Chunk translation function
def translate_long_text(text, chunk_size=CHUNK_SIZE):
    chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
    translated_chunks = []
    for chunk in chunks:
        try:
            translated_chunk = translate_text(chunk)
            translated_chunks.append(translated_chunk)
        except Exception as e:
            print(f"Chunk translation failed, error: {e}")
            translated_chunks.append("")  # If failed, add an empty string
    return "".join(translated_chunks)

# Extract and process Front Matter
def process_front_matter(content):
    # Match Front Matter section
    front_matter_match = re.search(r"---\n(.*?)\n---", content, re.DOTALL)
    if not front_matter_match:
        return content  # If no Front Matter, return the original content

    front_matter = front_matter_match.group(1)
    body = content[front_matter_match.end():].strip()

    # Process categories field in Front Matter
    if "categories:" in front_matter:
        for cn_category, en_category in CATEGORY_TRANSLATIONS.items():
            front_matter = front_matter.replace(f"categories: {cn_category}", f"categories: {en_category}")

    # Reassemble Front Matter and body
    return f"---\n{front_matter}\n---\n\n{body}"

# Traverse content directory
for root, dirs, files in os.walk(CONTENT_DIR):
    # Check if index.md exists but index.en.md does not
    if "index.md" in files and "index.en.md" not in files:
        index_md_path = os.path.join(root, "index.md")
        index_en_md_path = os.path.join(root, "index.en.md")

        # Read index.md content
        with open(index_md_path, "r", encoding="utf-8") as f:
            content = f.read()

        # Process Front Matter
        content = process_front_matter(content)

        # Translate content in chunks
        try:
            translated_content = translate_long_text(content)
            # Save translated content as index.en.md
            with open(index_en_md_path, "w", encoding="utf-8") as f:
                f.write(translated_content)
            print(f"Translated and saved: {index_en_md_path}")
        except Exception as e:
            print(f"Translation failed: {index_md_path}, error: {e}")

print("Batch translation completed!")

Tip
AI translation is generally not very fast, but Deepseek is currently the fastest. My 300+ articles with 800,000 Chinese characters took about five hours to translate, costing 1.7 yuan in API fees.
Hugo’s frontmatter section might fail to translate, especially at the junction between frontmatter and the main text. The translated result might lose the ---. It’s recommended to manually check after translation, as this issue might occur every few articles. You can also debug locally using hugo server to see if there are any errors.

Enable English Site

Refer to the Hugo theme configuration to enable it. For example, the settings for this site’s hugo.yaml are as follows:

  1. languageCode is set to zh-Hans, indicating that the default language of the site is Chinese. For all index.md files, the default URL prefix is https://lawtee.com. For all index.en.md files, the URL prefix is https://lawtee.com/en. The same applies to other languages.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
baseurl: https://lawtee.com
languageCode: zh-Hans ## Default is Chinese
languages:
    en:
        languageName: English
        title: Lawtee
        weight: 2
        params:
            sidebar:
                subtitle: Legal Practitioners
    zh-cn:
        languageName: 中文
        title: 法律小茶馆
        weight: 1
        params:
            sidebar:
                subtitle: 基层法律工作者

At this point, the local AI translation feature is set up. For future translations, simply run python translate.py locally to automatically translate new index.md files. If you find this cumbersome, you can refer to this article to set up a one-click automatic translation locally. Alternatively, you can use Github Actions’ powerful CI/CD tools to achieve automatic translation. The difference is that local translation results can be checked for bugs before submission, while automatic translation via Github Actions requires waiting for the build to complete to identify any bugs.

Set Up Github Actions for Automatic Translation

  1. Create a workflow file translate.yml in the Hugo repository, for example, D:/hugo/lawtee/.github/workflows/translate.yml, to define the automation task. The content is as follows:
View Code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
name: Translate Markdown Files

on:
  push:
    branches:
      - master  # Trigger on push to master branch
  schedule:
    - cron: '0 0 * * *'  # Run daily at UTC 00:00

jobs:
  translate:
    runs-on: ubuntu-latest  # Use Ubuntu environment

    steps:
      # 1. Checkout code
      - name: Checkout repository
        uses: actions/checkout@v3

      # 2. Set up Python environment
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'  # Specify Python version

      # 3. Install dependencies
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install openai

      # 4. Run translation script
      - name: Run translation script
        env:
          DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}  
        run: |
          python translate.py

      # 5. Commit changes
      - name: Commit and push changes
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}  
        run: |
          git config --global user.name "USERNAME" # 你的 Github 用户名
          git config --global user.email "EMAIL@EMAIL.com" # 你的 Github 邮箱
          git add .
          git diff --quiet && git diff --staged --quiet || git commit -m "Auto-translate Markdown files"
          git push
  1. Add a repository secret in the GitHub repository settings: Settings -> Secrets and variables -> Actions. Name it DEEPSEEK_API_KEY and enter your Deepseek API KEY as the content. (GitHub prohibits directly uploading API KEYS in files, so they must be added in the repository settings.) In Settings -> Actions -> general , Set the Workflow permissions to read and write, allow GitHub Actions to create and approve pull requests.

  2. Push translate.yml and translate.py to the GitHub repository.

    • Make sure to modify the CONTENT_DIR = "D:/hugo/lawtee/content/posts" in translate.py to the repository path. For example: CONTENT_DIR = "content/posts".
    • Also, change API_KEY = "sk-xxx" in translate.py to API_KEY = os.getenv("DEEPSEEK_API_KEY").

This way, GitHub Actions will automatically translate newly added index.md documents into index.en.md.

Tip
It is recommended to perform the initial AI batch translation locally for easier debugging. You can then use GitHub Actions for automatic translation later. During local testing, I found that batch translation occasionally causes errors in the frontmatter, but this issue has not been encountered when translating single articles.

All textual works on this website are protected by copyright, and the authors reserve all rights. The photos on this website, unless specifically stated, licensed under the CC BY-NC-ND 4.0 license.
Built with Hugo, Powered by Github.
Total Posts: 319, Total Words: 418935.
本站已加入BLOGS·CN