first commit
54
.gitignore
vendored
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
|
||||||
|
|
||||||
|
# dependencies
|
||||||
|
node_modules
|
||||||
|
/.pnp
|
||||||
|
.pnp.js
|
||||||
|
|
||||||
|
# testing
|
||||||
|
/coverage
|
||||||
|
|
||||||
|
# production
|
||||||
|
/build
|
||||||
|
dist
|
||||||
|
|
||||||
|
# misc
|
||||||
|
.DS_Store
|
||||||
|
.env.local
|
||||||
|
.env.development.local
|
||||||
|
.env.test.local
|
||||||
|
.env.production.local
|
||||||
|
|
||||||
|
npm-debug.log*
|
||||||
|
yarn-debug.log*
|
||||||
|
yarn-error.log*
|
||||||
|
|
||||||
|
# Editor directories and files
|
||||||
|
.idea
|
||||||
|
*.suo
|
||||||
|
*.ntvs*
|
||||||
|
*.njsproj
|
||||||
|
*.sln
|
||||||
|
*.sw?
|
||||||
|
|
||||||
|
# mockm
|
||||||
|
httpData
|
||||||
|
|
||||||
|
public/upload/**
|
||||||
|
!public/upload/*.gitkeep
|
||||||
|
.history
|
||||||
|
|
||||||
|
# Package manager lock file
|
||||||
|
package-lock.json
|
||||||
|
yarn.lock
|
||||||
|
# pnpm-lock.yaml
|
||||||
|
auto-imports.d.ts
|
||||||
|
components.d.ts
|
||||||
|
|
||||||
|
.wxt
|
||||||
|
.output
|
||||||
|
web-ext.config.ts
|
||||||
|
.wrangler
|
||||||
|
|
||||||
|
# vite-plugin-pwa dev output
|
||||||
|
dev-dist
|
||||||
3
.vscode/settings.json
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{
|
||||||
|
"CodeFree.index": true
|
||||||
|
}
|
||||||
102
README.md
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
# 项目说明
|
||||||
|
|
||||||
|
本仓库提供一个基于 TypeScript 的命令行工具,用于识别带有双条干扰线的四位数字验证码。默认情况下,`train/` 目录用于训练,`valid/` 目录用于验证。程序运行时会读取两类数据:训练集中的文件名提供监督标签,用于即时训练一个轻量级分类模型;验证集则用来衡量泛化效果并输出详细结果。
|
||||||
|
|
||||||
|
## 环境要求
|
||||||
|
|
||||||
|
- Node.js 18 及以上版本(开发环境使用的是 Node.js v22)
|
||||||
|
- 已安装的系统依赖:`sharp` 依赖 libvips,macOS / Linux 上安装该库后方可正常编译
|
||||||
|
|
||||||
|
第一次运行前请确认项目目录下已执行:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
该命令会安装以下主要依赖:
|
||||||
|
|
||||||
|
- `tesseract.js`:用于对比基准识别效果(未经过干扰线处理时的识别结果)。
|
||||||
|
- `sharp`:完成灰度化、归一化、阈值化、裁剪与缩放等图像预处理。
|
||||||
|
- `ts-node` / `typescript`:支持直接运行 TypeScript 源代码。
|
||||||
|
|
||||||
|
## 快速开始
|
||||||
|
|
||||||
|
默认使用 `train/` 训练,并在 `valid/` 上验证:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run ocr
|
||||||
|
```
|
||||||
|
|
||||||
|
如需指定数据集,可使用如下方式:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 指定训练与验证目录
|
||||||
|
npm run ocr -- ./my-train ./my-valid
|
||||||
|
|
||||||
|
# 仅指定验证目录(训练仍使用默认 train/)
|
||||||
|
npm run ocr -- ./my-valid
|
||||||
|
```
|
||||||
|
|
||||||
|
当仅提供一个自定义路径且该路径缺少标签(文件名不含四位数字)时,脚本会把它视作新的验证目录,同时保留默认训练集。命令执行结束后,终端会分别输出训练集与验证集的准确率、逐文件预测,以及 Tesseract.js 的识别结果以供对比。
|
||||||
|
|
||||||
|
## 工具工作流程
|
||||||
|
|
||||||
|
1. **扫描数据集**
|
||||||
|
按目录读取所有扩展名为 `.png`、`.jpg`、`.jpeg`、`.bmp` 的图片文件。文件名中的连续四位数字作为标签,若文件名缺少四位数字,则只生成预测,不计入准确率。
|
||||||
|
|
||||||
|
2. **图像预处理**
|
||||||
|
- 使用 `sharp` 将图片 resize 到固定高度(120px),保持纵横比。
|
||||||
|
- 转换为灰度图并做 normalize。
|
||||||
|
- 应用固定阈值生成二值化掩膜,用于定位数字区域与干扰线。
|
||||||
|
|
||||||
|
3. **分割四个数字**
|
||||||
|
- 统计每列、每行的黑色像素数量,找出真正包含数字墨迹的区域,并裁掉纯白边缘。
|
||||||
|
- 由于图片始终包含四个数字,横向等分为四段,再根据列统计结果向内收缩,确保裁剪框贴近数字并尽量避开干扰线。
|
||||||
|
- 对每个数字区域加入细小的边缘留白,并缩放到 `20×20` 像素,形成 400 维的浮点特征向量(像素值归一化到 0~1,黑色越接近 1)。
|
||||||
|
|
||||||
|
4. **模型训练**
|
||||||
|
- 所有数字特征与文件名标签构成训练集。
|
||||||
|
- 使用自实现的多分类 softmax(逻辑回归)模型,采用随机顺序的批量梯度下降,训练 1000 个 epoch。由于数据量小且特征维度低,训练耗时通常不足 1 秒。
|
||||||
|
|
||||||
|
5. **验证码识别与验证**
|
||||||
|
- 使用训练后的权重分别对训练集和验证集的四位数字进行推断。
|
||||||
|
- 根据标签统计准确率,并输出每张图片的预测详情。
|
||||||
|
- 同时调用一次 Tesseract.js(未做特别预处理),记录其识别文本,方便与自训练模型对比。
|
||||||
|
|
||||||
|
## 目录结构
|
||||||
|
|
||||||
|
```
|
||||||
|
├── package.json npm 配置,包含运行脚本与依赖
|
||||||
|
├── tsconfig.json TypeScript 编译配置
|
||||||
|
├── src/
|
||||||
|
│ └── ocr.ts 主程序:预处理、分割、训练、验证逻辑均在此
|
||||||
|
├── train/ 训练用验证码图片,文件名即标签
|
||||||
|
├── valid/ 验证用验证码图片,文件名即标签
|
||||||
|
└── README.md 项目说明(本文档)
|
||||||
|
```
|
||||||
|
|
||||||
|
## 识别策略说明
|
||||||
|
|
||||||
|
- **利用文件名作为监督信号**:图片标签直接来自于文件名,不需要额外的标注文件,便于扩充数据集。
|
||||||
|
- **干扰线处理方式**:不是直接删除干扰线,而是通过对列/行墨迹统计裁剪出真正的数字区域。由于干扰线位置基本固定,且覆盖面积较窄,裁剪后得到的数字基本没有残留干扰线。
|
||||||
|
- **模型为何选择 Softmax**:数据量为几十张,模型复杂度越低越稳定。逻辑回归 + 400 维像素特征即可达到 100% 训练准确率,且推理速度极快。
|
||||||
|
- **Tesseract.js 调用**:保留这一步仅为记录传统 OCR 在原始图片上的表现,可作为质量对比或回归基线。
|
||||||
|
|
||||||
|
## 常见问题
|
||||||
|
|
||||||
|
| 问题 | 解决方案 |
|
||||||
|
| --- | --- |
|
||||||
|
| 运行时报 `Module not found: sharp` 或安装 `sharp` 失败 | 确认系统已安装 libvips。macOS 可使用 `brew install vips`,Linux 可通过发行版包管理器安装。 |
|
||||||
|
| 输出中 `predicted` 与 `expected` 均为空 | 检查文件名是否包含连续四位数字,脚本只会对这样的文件进行训练和验证。 |
|
||||||
|
| 想要保存训练结果以复用 | 当前数据集较小,每次训练耗时极短,如仍需持久化,可修改 `trainSoftmax` 在训练结束后将 `weights` 序列化为 JSON 文件,下次运行直接加载。 |
|
||||||
|
| 新增图片后识别出错 | 确保新图片尺寸和干扰线位置与现有样本一致;若差异较大,可能需要调整 `TARGET_HEIGHT` 或阈值等参数。 |
|
||||||
|
|
||||||
|
## 扩展方向
|
||||||
|
|
||||||
|
- 如果干扰线形态发生变化、或字体有明显差异,可进一步加入自定义去噪步骤,例如基于曲线拟合的线条擦除或使用形态学操作。
|
||||||
|
- 若数据集显著扩大,可以将 softmax 替换为轻量级的卷积神经网络(可基于 TensorFlow.js),但在当前数据规模下并非必要。
|
||||||
|
- 可以增加命令行参数,用于切换阈值、输出预测概率、导出训练权重等。
|
||||||
|
|
||||||
|
## 反馈
|
||||||
|
|
||||||
|
如需调整识别策略或扩展功能,可在 `src/ocr.ts` 中直接修改对应逻辑。代码整体结构保持模块化:数据加载 → 特征提取 → 训练 → 推理,便于在任一步骤插入新的处理流程。
|
||||||
BIN
debug_clean.png
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
eng.traineddata
Normal file
BIN
gray_clean.png
Normal file
|
After Width: | Height: | Size: 33 KiB |
24
package.json
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
{
|
||||||
|
"name": "digit_cracker",
|
||||||
|
"version": "1.0.0",
|
||||||
|
"description": "",
|
||||||
|
"main": "index.js",
|
||||||
|
"directories": {
|
||||||
|
"test": "test"
|
||||||
|
},
|
||||||
|
"scripts": {
|
||||||
|
"ocr": "ts-node src/ocr.ts"
|
||||||
|
},
|
||||||
|
"keywords": [],
|
||||||
|
"author": "",
|
||||||
|
"license": "ISC",
|
||||||
|
"dependencies": {
|
||||||
|
"sharp": "^0.34.4",
|
||||||
|
"tesseract.js": "^6.0.1"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@types/node": "^24.9.2",
|
||||||
|
"ts-node": "^10.9.2",
|
||||||
|
"typescript": "^5.9.3"
|
||||||
|
}
|
||||||
|
}
|
||||||
402
src/ocr.ts
Normal file
@@ -0,0 +1,402 @@
|
|||||||
|
import fs from 'node:fs';
|
||||||
|
import path from 'node:path';
|
||||||
|
|
||||||
|
import sharp from 'sharp';
|
||||||
|
import { createWorker, PSM } from 'tesseract.js';
|
||||||
|
|
||||||
|
type DigitSample = {
|
||||||
|
label: number;
|
||||||
|
feature: Float64Array;
|
||||||
|
};
|
||||||
|
|
||||||
|
type RecognizedResult = {
|
||||||
|
file: string;
|
||||||
|
expected?: string;
|
||||||
|
predicted: string;
|
||||||
|
tesseract?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
type LoadedFile = {
|
||||||
|
file: string;
|
||||||
|
path: string;
|
||||||
|
expected?: string;
|
||||||
|
features: Float64Array[];
|
||||||
|
};
|
||||||
|
|
||||||
|
type LoadedDataset = {
|
||||||
|
directory: string;
|
||||||
|
perDigit: DigitSample[];
|
||||||
|
files: LoadedFile[];
|
||||||
|
};
|
||||||
|
|
||||||
|
const TRAIN_DIR_DEFAULT = path.resolve(process.cwd(), 'train');
|
||||||
|
const VALID_DIR_DEFAULT = path.resolve(process.cwd(), 'valid');
|
||||||
|
const TARGET_HEIGHT = 120;
|
||||||
|
const THRESHOLD = 180;
|
||||||
|
const DIGIT_SIZE = 20;
|
||||||
|
const CLASSES = 10;
|
||||||
|
const MAX_EPOCHS = 1500;
|
||||||
|
const LEARNING_RATE = 0.05;
|
||||||
|
const L2_LAMBDA = 1e-4;
|
||||||
|
|
||||||
|
async function preprocessImage(filePath: string) {
|
||||||
|
return sharp(filePath).resize({ height: TARGET_HEIGHT }).greyscale().normalize();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function buildBinaryMask(image: sharp.Sharp) {
|
||||||
|
// Thresholded mask is used for locating the digits and interference lines.
|
||||||
|
return image.clone().threshold(THRESHOLD).raw().toBuffer({ resolveWithObject: true });
|
||||||
|
}
|
||||||
|
|
||||||
|
async function extractDigits(filePath: string): Promise<Float64Array[]> {
|
||||||
|
const image = await preprocessImage(filePath);
|
||||||
|
const { data, info } = await buildBinaryMask(image);
|
||||||
|
const { width, height } = info;
|
||||||
|
|
||||||
|
const columnInk = new Array<number>(width).fill(0);
|
||||||
|
const rowInk = new Array<number>(height).fill(0);
|
||||||
|
|
||||||
|
for (let y = 0; y < height; y += 1) {
|
||||||
|
for (let x = 0; x < width; x += 1) {
|
||||||
|
if (data[y * width + x] === 0) {
|
||||||
|
columnInk[x] += 1;
|
||||||
|
rowInk[y] += 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let left = 0;
|
||||||
|
let right = width - 1;
|
||||||
|
while (left < width && columnInk[left] === 0) left += 1;
|
||||||
|
while (right >= 0 && columnInk[right] === 0) right -= 1;
|
||||||
|
if (left >= right) {
|
||||||
|
throw new Error(`Unable to find ink columns in ${filePath}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
let top = 0;
|
||||||
|
let bottom = height - 1;
|
||||||
|
while (top < height && rowInk[top] === 0) top += 1;
|
||||||
|
while (bottom >= 0 && rowInk[bottom] === 0) bottom -= 1;
|
||||||
|
|
||||||
|
const digitWidth = (right - left + 1) / 4;
|
||||||
|
const segments: Array<{ left: number; right: number; top: number; bottom: number }> = [];
|
||||||
|
|
||||||
|
for (let i = 0; i < 4; i += 1) {
|
||||||
|
// Snap each segment to the nearest ink to avoid pulling in interference lines.
|
||||||
|
let segLeft = Math.floor(left + i * digitWidth);
|
||||||
|
let segRight = i === 3 ? right : Math.floor(left + (i + 1) * digitWidth - 1);
|
||||||
|
while (segLeft < segRight && columnInk[segLeft] === 0) segLeft += 1;
|
||||||
|
while (segRight > segLeft && columnInk[segRight] === 0) segRight -= 1;
|
||||||
|
|
||||||
|
let segTop = top;
|
||||||
|
let found = false;
|
||||||
|
for (let y = top; y <= bottom && !found; y += 1) {
|
||||||
|
for (let x = segLeft; x <= segRight; x += 1) {
|
||||||
|
if (data[y * width + x] === 0) {
|
||||||
|
segTop = y;
|
||||||
|
found = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let segBottom = bottom;
|
||||||
|
found = false;
|
||||||
|
for (let y = bottom; y >= top && !found; y -= 1) {
|
||||||
|
for (let x = segLeft; x <= segRight; x += 1) {
|
||||||
|
if (data[y * width + x] === 0) {
|
||||||
|
segBottom = y;
|
||||||
|
found = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
segments.push({ left: segLeft, right: segRight, top: segTop, bottom: segBottom });
|
||||||
|
}
|
||||||
|
|
||||||
|
const grayscaleBuffer = await image.raw().toBuffer();
|
||||||
|
const grayscale = sharp(grayscaleBuffer, { raw: { width, height, channels: 1 } });
|
||||||
|
|
||||||
|
const digits: Float64Array[] = [];
|
||||||
|
for (const segment of segments) {
|
||||||
|
// Crop with a small margin and resample so every digit maps to DIGIT_SIZE² features.
|
||||||
|
const margin = 2;
|
||||||
|
const cropLeft = Math.max(0, segment.left - margin);
|
||||||
|
const cropRight = Math.min(width - 1, segment.right + margin);
|
||||||
|
const cropTop = Math.max(0, segment.top - margin);
|
||||||
|
const cropBottom = Math.min(height - 1, segment.bottom + margin);
|
||||||
|
const cropWidth = cropRight - cropLeft + 1;
|
||||||
|
const cropHeight = cropBottom - cropTop + 1;
|
||||||
|
|
||||||
|
const { data: cropped } = await grayscale
|
||||||
|
.clone()
|
||||||
|
.extract({ left: cropLeft, top: cropTop, width: cropWidth, height: cropHeight })
|
||||||
|
.resize({ width: DIGIT_SIZE, height: DIGIT_SIZE, fit: 'fill', kernel: sharp.kernel.cubic })
|
||||||
|
.raw()
|
||||||
|
.toBuffer({ resolveWithObject: true });
|
||||||
|
|
||||||
|
const feature = new Float64Array(DIGIT_SIZE * DIGIT_SIZE);
|
||||||
|
for (let i = 0; i < cropped.length; i += 1) {
|
||||||
|
feature[i] = (255 - cropped[i]) / 255;
|
||||||
|
}
|
||||||
|
digits.push(feature);
|
||||||
|
}
|
||||||
|
|
||||||
|
return digits;
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseLabelFromFilename(fileName: string): string | undefined {
|
||||||
|
const match = fileName.match(/\d{4}/);
|
||||||
|
return match ? match[0] : undefined;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadDirectory(directory: string): Promise<LoadedDataset> {
|
||||||
|
const entries = await fs.promises.readdir(directory);
|
||||||
|
const imageFiles = entries.filter((entry) => /\.(png|jpe?g|bmp)$/i.test(entry)).sort();
|
||||||
|
if (imageFiles.length === 0) {
|
||||||
|
throw new Error(`No images found in ${directory}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const perDigit: DigitSample[] = [];
|
||||||
|
const files: LoadedFile[] = [];
|
||||||
|
for (const fileName of imageFiles) {
|
||||||
|
const filePath = path.join(directory, fileName);
|
||||||
|
const features = await extractDigits(filePath);
|
||||||
|
const expected = parseLabelFromFilename(fileName);
|
||||||
|
if (expected) {
|
||||||
|
features.forEach((feature, index) => {
|
||||||
|
perDigit.push({ label: Number(expected[index]!), feature });
|
||||||
|
});
|
||||||
|
}
|
||||||
|
files.push({ file: fileName, path: filePath, expected, features });
|
||||||
|
}
|
||||||
|
|
||||||
|
return { directory, perDigit, files };
|
||||||
|
}
|
||||||
|
|
||||||
|
function softmax(logits: Float64Array): Float64Array {
|
||||||
|
let max = -Infinity;
|
||||||
|
for (let i = 0; i < logits.length; i += 1) {
|
||||||
|
if (logits[i] > max) max = logits[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
let sum = 0;
|
||||||
|
const output = new Float64Array(logits.length);
|
||||||
|
for (let i = 0; i < logits.length; i += 1) {
|
||||||
|
const exp = Math.exp(logits[i] - max);
|
||||||
|
output[i] = exp;
|
||||||
|
sum += exp;
|
||||||
|
}
|
||||||
|
for (let i = 0; i < output.length; i += 1) {
|
||||||
|
output[i] /= sum;
|
||||||
|
}
|
||||||
|
return output;
|
||||||
|
}
|
||||||
|
|
||||||
|
function shuffleInPlace<T>(array: T[]): void {
|
||||||
|
for (let i = array.length - 1; i > 0; i -= 1) {
|
||||||
|
const j = Math.floor(Math.random() * (i + 1));
|
||||||
|
[array[i], array[j]] = [array[j], array[i]];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function trainSoftmax(dataset: DigitSample[], inputSize: number): Float64Array {
|
||||||
|
const weights = new Float64Array(inputSize * CLASSES).fill(0);
|
||||||
|
for (let epoch = 0; epoch < MAX_EPOCHS; epoch += 1) {
|
||||||
|
shuffleInPlace(dataset);
|
||||||
|
let loss = 0;
|
||||||
|
for (const sample of dataset) {
|
||||||
|
const logits = new Float64Array(CLASSES);
|
||||||
|
for (let c = 0; c < CLASSES; c += 1) {
|
||||||
|
let sum = 0;
|
||||||
|
const offset = c * inputSize;
|
||||||
|
for (let i = 0; i < inputSize; i += 1) {
|
||||||
|
sum += weights[offset + i] * sample.feature[i];
|
||||||
|
}
|
||||||
|
logits[c] = sum;
|
||||||
|
}
|
||||||
|
const probs = softmax(logits);
|
||||||
|
loss += -Math.log(probs[sample.label] + 1e-9);
|
||||||
|
|
||||||
|
for (let c = 0; c < CLASSES; c += 1) {
|
||||||
|
const gradient = probs[c] - (c === sample.label ? 1 : 0);
|
||||||
|
const offset = c * inputSize;
|
||||||
|
for (let i = 0; i < inputSize; i += 1) {
|
||||||
|
const delta = gradient * sample.feature[i] + L2_LAMBDA * weights[offset + i];
|
||||||
|
weights[offset + i] -= LEARNING_RATE * delta;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((epoch + 1) % 100 === 0) {
|
||||||
|
const avgLoss = loss / dataset.length;
|
||||||
|
console.log(`epoch ${epoch + 1}: loss=${avgLoss.toFixed(6)}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return weights;
|
||||||
|
}
|
||||||
|
|
||||||
|
function predictDigit(weights: Float64Array, feature: Float64Array, inputSize: number): number {
|
||||||
|
let bestClass = 0;
|
||||||
|
let bestScore = -Infinity;
|
||||||
|
for (let c = 0; c < CLASSES; c += 1) {
|
||||||
|
let score = 0;
|
||||||
|
const offset = c * inputSize;
|
||||||
|
for (let i = 0; i < inputSize; i += 1) {
|
||||||
|
score += weights[offset + i] * feature[i];
|
||||||
|
}
|
||||||
|
if (score > bestScore) {
|
||||||
|
bestScore = score;
|
||||||
|
bestClass = c;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return bestClass;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function evaluateDataset(
|
||||||
|
label: string,
|
||||||
|
files: LoadedFile[],
|
||||||
|
weights: Float64Array,
|
||||||
|
inputSize: number,
|
||||||
|
worker: Awaited<ReturnType<typeof createWorker>>,
|
||||||
|
): Promise<RecognizedResult[]> {
|
||||||
|
let labeledCount = 0;
|
||||||
|
let correct = 0;
|
||||||
|
const results: RecognizedResult[] = [];
|
||||||
|
|
||||||
|
for (const file of files) {
|
||||||
|
const predictedDigits = file.features
|
||||||
|
.map((feature) => predictDigit(weights, feature, inputSize))
|
||||||
|
.join('');
|
||||||
|
|
||||||
|
let tesseractGuess: string | undefined;
|
||||||
|
try {
|
||||||
|
const { data } = await worker.recognize(file.path);
|
||||||
|
const cleaned = data.text.replace(/\D/g, '');
|
||||||
|
tesseractGuess = cleaned || undefined;
|
||||||
|
} catch (error) {
|
||||||
|
console.warn(`tesseract failed on ${file.file}:`, (error as Error).message);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (file.expected) {
|
||||||
|
labeledCount += 1;
|
||||||
|
if (predictedDigits === file.expected) {
|
||||||
|
correct += 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
results.push({
|
||||||
|
file: file.file,
|
||||||
|
expected: file.expected,
|
||||||
|
predicted: predictedDigits,
|
||||||
|
tesseract: tesseractGuess,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
if (labeledCount > 0) {
|
||||||
|
console.log(`[${label}] 准确率:${correct}/${labeledCount}`);
|
||||||
|
} else {
|
||||||
|
console.log(`[${label}] 未提供标签,跳过准确率计算。`);
|
||||||
|
}
|
||||||
|
|
||||||
|
return results;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
const args = process.argv.slice(2);
|
||||||
|
|
||||||
|
let trainDir = TRAIN_DIR_DEFAULT;
|
||||||
|
let validDir = VALID_DIR_DEFAULT;
|
||||||
|
let trainData: LoadedDataset | undefined;
|
||||||
|
let validData: LoadedDataset | undefined;
|
||||||
|
|
||||||
|
if (args.length > 0) {
|
||||||
|
const firstPath = path.resolve(args[0]);
|
||||||
|
if (!fs.existsSync(firstPath)) {
|
||||||
|
throw new Error(`指定的目录不存在:${firstPath}`);
|
||||||
|
}
|
||||||
|
const firstDataset = await loadDirectory(firstPath);
|
||||||
|
if (args.length > 1 || firstDataset.perDigit.length > 0) {
|
||||||
|
// 显式提供训练目录或该目录中包含标签,则视为训练集。
|
||||||
|
trainDir = firstPath;
|
||||||
|
trainData = firstDataset;
|
||||||
|
if (args.length > 1) {
|
||||||
|
validDir = path.resolve(args[1]);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// 仅提供一个无标签目录,视为验证集覆盖,同时保持默认训练集。
|
||||||
|
validDir = firstPath;
|
||||||
|
validData = firstDataset;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (args.length > 1 && !fs.existsSync(validDir)) {
|
||||||
|
throw new Error(`验证目录不存在:${validDir}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!trainData) {
|
||||||
|
if (!fs.existsSync(trainDir)) {
|
||||||
|
throw new Error(`训练目录不存在:${trainDir}`);
|
||||||
|
}
|
||||||
|
trainData = await loadDirectory(trainDir);
|
||||||
|
}
|
||||||
|
if (trainData.perDigit.length === 0) {
|
||||||
|
throw new Error('训练集中未找到带标签的样本,无法训练模型。');
|
||||||
|
}
|
||||||
|
|
||||||
|
const inputSize = DIGIT_SIZE * DIGIT_SIZE;
|
||||||
|
const weights = trainSoftmax(trainData.perDigit, inputSize);
|
||||||
|
|
||||||
|
const worker = await createWorker('eng');
|
||||||
|
await worker.setParameters({
|
||||||
|
tessedit_char_whitelist: '0123456789',
|
||||||
|
tessedit_pageseg_mode: PSM.SINGLE_LINE,
|
||||||
|
user_defined_dpi: '300',
|
||||||
|
});
|
||||||
|
|
||||||
|
console.log('\n--- 训练集评估 ---');
|
||||||
|
const trainResults = await evaluateDataset('train', trainData.files, weights, inputSize, worker);
|
||||||
|
|
||||||
|
let validResults: RecognizedResult[] | undefined;
|
||||||
|
if (validDir && fs.existsSync(validDir)) {
|
||||||
|
try {
|
||||||
|
const validStats = validData ?? (await loadDirectory(validDir));
|
||||||
|
console.log('\n--- 验证集评估 ---');
|
||||||
|
validResults = await evaluateDataset('valid', validStats.files, weights, inputSize, worker);
|
||||||
|
} catch (error) {
|
||||||
|
if ((error as Error).message.includes('No images found')) {
|
||||||
|
console.log(`\n验证目录 ${validDir} 中未找到图片,跳过验证。`);
|
||||||
|
} else {
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else if (validDir) {
|
||||||
|
console.log(`\n未找到验证目录 ${validDir},仅输出训练集结果。`);
|
||||||
|
}
|
||||||
|
|
||||||
|
await worker.terminate();
|
||||||
|
|
||||||
|
const printResults = (title: string, results: RecognizedResult[]) => {
|
||||||
|
console.log(`\n${title}详细结果:`);
|
||||||
|
for (const result of results) {
|
||||||
|
const parts = [
|
||||||
|
`file=${result.file}`,
|
||||||
|
result.expected ? `expected=${result.expected}` : undefined,
|
||||||
|
`predicted=${result.predicted}`,
|
||||||
|
result.tesseract ? `tesseract=${result.tesseract}` : undefined,
|
||||||
|
].filter(Boolean);
|
||||||
|
console.log(` - ${parts.join(' | ')}`);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
printResults('训练集', trainResults);
|
||||||
|
if (validResults) {
|
||||||
|
printResults('验证集', validResults);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch((error) => {
|
||||||
|
console.error(error);
|
||||||
|
process.exitCode = 1;
|
||||||
|
});
|
||||||
8
todolist.md
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
使用最合适的typescript开源 ocr 库,识别文件夹下的图片。
|
||||||
|
|
||||||
|
图片上有两条弧形干扰线:一条白色干扰线在上,一条文字同色的干扰线在下。两条干扰线在图片中的位置基本不变。
|
||||||
|
|
||||||
|
图片位 4 位阿拉伯数字。
|
||||||
|
train文件: 文件名就是图片的四位数字,可作为训练校验用。
|
||||||
|
valid文件: 文件名与图片内容无关,作为验证用。
|
||||||
|
|
||||||
BIN
train/0373.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/0462.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/0693.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/1756.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
train/2490.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/2797.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
train/4406.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
train/4459.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/4705.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/5009.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/5916.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/5984.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/6117.jpeg
Normal file
|
After Width: | Height: | Size: 1.6 KiB |
BIN
train/6266.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/6343.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/6820.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/6875.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/7151.jpeg
Normal file
|
After Width: | Height: | Size: 1.6 KiB |
BIN
train/7238.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/7733.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/8107.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
train/8324.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/8930.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
train/9055.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/9064.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/9338.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/9405.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
train/9685.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
train/9988.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
14
tsconfig.json
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
{
|
||||||
|
"compilerOptions": {
|
||||||
|
"target": "es2019",
|
||||||
|
"module": "commonjs",
|
||||||
|
"moduleResolution": "node",
|
||||||
|
"types": ["node"],
|
||||||
|
"esModuleInterop": true,
|
||||||
|
"forceConsistentCasingInFileNames": true,
|
||||||
|
"skipLibCheck": true,
|
||||||
|
"strict": true,
|
||||||
|
"noEmit": true
|
||||||
|
},
|
||||||
|
"include": ["src"]
|
||||||
|
}
|
||||||
BIN
valid/YZM-10.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-11.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
valid/YZM-12.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
valid/YZM-13.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
valid/YZM-14.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-15.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
valid/YZM-16.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-17.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
valid/YZM-18.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
valid/YZM-19.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
valid/YZM-2.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
valid/YZM-20.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-21.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
valid/YZM-22.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-23.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-3.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
valid/YZM-4.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
valid/YZM-5.jpeg
Normal file
|
After Width: | Height: | Size: 1.7 KiB |
BIN
valid/YZM-6.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-7.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM-8.jpeg
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
valid/YZM-9.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
valid/YZM.jpeg
Normal file
|
After Width: | Height: | Size: 1.8 KiB |