Sensitive information protection toolkit

Related tags

Security godlp
Overview

godlp

一、简介

为了保障企业的数据安全和隐私安全,godlp 提供了一系列针对敏感数据的识别和处置方案, 其中包括敏感数据识别算法,数据脱敏处理方式,业务自定义的配置选项和海量数据处理能力。 godlp 能够应用多种隐私合规标准,对原始数据进行分级打标、判断敏感级别和实施相应的脱敏处理。

In order to achieve data security and privacy security requirements for enterprises, godlp provides a serial of sensitive information finding and handling methods, including sensitive detection algorithm, de-identification APIs, business DIY configuration and the big data handling ability. Also, godlp is able to apply a variety of privacy compliance standers, do classification based on sensitive levels, and mask data based on rules.

二、关键能力

godlp 能够广泛支持结构化(JSON数据、KV数据、golang map)和非结构化数据(多语言字符串)。

1. 敏感数据自动发现

DLP 内置多种敏感数据识别规则,能对原始数据进行敏感类型识别,确保敏感信息能被妥善处理。

2. 敏感数据脱敏处理

DLP 支持多种脱敏算法,业务可以根据需求对敏感数据进行不同的脱敏处理。

3. 业务自定义配置选项

除默认的敏感信息识别和处理规则外,业务可以根据实际情况,配置自定义的YAML规则,DLP 能够根据传入的配置选项,完成相应的数据处理任务。

三、接入方式

go get github.com/bytedance/[email protected]

示例代码在 mainrun/mainrun.go 文件中

在godlp代码根目录下输入以下命令可以进行编译和运行

make
make run
make test
make bench

API 描述

dlpheader定义了 godlp SDK需要的数据结构,常量定义等。godlp SDK主要提供了以下API进行敏感信息识别和脱敏。

  1. ApplyConfig(conf string) error
  • ApplyConfig by configuration content
  • 传入conf string 进行配置
  1. ApplyConfigFile(filePath string) error
  • ApplyConfigFile by config file path
  • 传入filePath 进行配置
  1. Detect(inputText string) ([]*DetectResult, error)
  • Detect string
  • 对string进行敏感信息识别
  1. DetectMap(inputMap map[string]string) ([]*DetectResult, error)
  • DetectMap detects KV map
  • 对map[string]string进行敏感信息识别
  1. DetectJSON(jsonText string) ([]*DetectResult, error)
  • DetectJSON detects json string
  • 对json string 进行敏感信息识别
  1. Deidentify(inputText string) (string, []*DetectResult, error)
  • Deidentify detects string firstly, then return masked string and results
  • 对string先识别,然后按规则进行打码
  1. DeidentifyMap(inputMap map[string]string) (map[string]string, []*DetectResult, error)
  • DeidentifyMap detects KV map firstly,then return masked map
  • 对map[string]string先识别,然后按规则进行打码
  1. ShowResults(resultArray []*DetectResult)
  • ShowResults print results in console
  • 打印识别结果
  1. Mask(inputText string, methodName string) (string, error)
  • Mask inputText following predefined method of MaskRules in config
  • 根据脱敏规则直接脱敏
  1. Close()
  • Close engine object, release memory of inner object
  • 关闭,释放内部变量
  1. GetVersion() string
  • Get Dlp SDK version string
  • 获取版本号
  1. RegisterMasker(maskName string, maskFunc func(string) (string, error)) error
  • Register DIY Masker
  • 注册自定义打码函数
  1. NewLogProcesser() logs.Processor
  • NewLogProcesser create a log processer for the package logs
  • 日志脱敏处理函数
  1. MaskStruct(inObj interface{}) (interface{}, error)
  • MaskStruct will mask a strcut object by tag mask info
  • 根据tag mask里定义的脱敏规则对struct object直接脱敏

四、规则文件

规则文件请见 conf.yml

config 文件以yaml格式为准,整体分为: Global,MaskRules,Rules 三个部分。其中:

  1. Global 包含影响DLP全局的一些配置项,例如API版本、禁用的规则ID、是否启用后端服务辅助判断。
  2. MaskRules 包含脱敏操作的配置,例如打码、替换等方式。
  3. Rules 包含识别和处理规则,其中一个识别过程包括 Detect, Filter 和 Verify 三个依次的过程, 处理需要引用上面定义的脱敏规则。

五、架构

godlp 以 Engine 结构为主,通过Engine对象来实现 EngineAPI 接口,直接实现的接口以sdk.go,sdkdeidentify.go,sdkdetect.gosdkmask.go为主。对于deidentify和mask操作,会继续调用子目录下的detector,mask子模块。

5.1 文件说明

  1. sdk.go: 实现EngineAPI接口中业务无关的API,例如Close()

  2. sdk_test.go: 单元测试用例。

  3. sdkconfig.go: 实现配置相关的接口,例如ApplyConfig()

  4. sdkdeidentify.go: 实现脱敏相关的接口。

  5. sdkdetect.go: 实现敏感信息检测接口。

  6. sdkinternal.go: 实现 Engine 对象的内部函数。

  7. sdkmask.go: 实现直接打码的接口。

  8. conf.yml: 内置的默认配置文件,含DLP维护的规则。

  9. bindata.go: go generate生成的数据文件,包含conf.yml

5.2 子目录说明

  1. conf: 实现DlpConf结构,处理配置文件。

  2. detector: 敏感信息检测逻辑的内部实现。

  3. errlist: 报错信息列表。

  4. mask: 直接脱敏的内部实现。

  5. util: 辅助功能实现。

  6. dlpheader: dlp sdk 定义的接口头文件。

六、致谢

DLP项目从立项开始,一路走来,离不开其中辛苦付出的开发同学们,这里向为DLP写下代码的同学,致以最诚挚的感谢,以下同学排名不分先后。

  • 丁保增 负责DLP1.0 识别信息验证模块。
  • 王聪 负责DLP1.0 官网、JSON识别处理等模块、多个项目接入。
  • 王赛 负责DLP1.0 去标识模块。
  • 苏宁宁 负责DLP1.0 性能准确率测试。
  • 王帅 负责DLP1.0 API头文件。
  • 鲁云飞 负责DLP1.0 AI模块、NLP服务。
  • 石岚 负责DLP1.0 AI模块,大数据处理API模块,发版等。
  • 黄勇辉 负责DLP1.0 AI模块,优化更新了大量规则。
  • 张宇鹏 参与DLP1.0 AI模块。
  • 李赛南 参与DLP1.0 AI模块。
  • 王珩 负责DLP1.0 保格式加密、保顺序加密模块。
  • 夏世文 负责DLP1.0 性能优化、规则代码实现、主要完成了多个项目的合作开发工作。
  • 罗同龙 为DLP2.0 提交了log处理性能优化的PR。
  • 乔鑫 负责DLP2.0 服务端代码、SDK性能优化、技术实现。
  • 杨经宇 负责DLP1.0 和 2.0的整体项目。
Issues
  • sdk.go:23:19: undefined: MustAsset

    sdk.go:23:19: undefined: MustAsset

    ➜ godlp git:(main) ✗ go version go version go1.17.3 darwin/amd64 ➜ godlp git:(main) ✗ make

    github.com/bytedance/godlp

    ./sdk.go:23:19: undefined: MustAsset make: *** [release] Error 2

    opened by drone789 2
  • make error

    make error

    ➜  godlp git:(main) make
    go: gopkg.in/[email protected]: missing go.sum entry; to add it:
    	go mod download gopkg.in/yaml.v2
    go: gopkg.in/[email protected]: missing go.sum entry; to add it:
    	go mod download gopkg.in/yaml.v2
    make: *** [gen] Error 1
    ➜  godlp git:(main) go mod download gopkg.in/yaml.v2
    ➜  godlp git:(main) make
    sdk.go:1: running "go-bindata": exec: "go-bindata": executable file not found in $PATH
    make: *** [gen] Error 1
    
    opened by alchu4n 1
Owner
Bytedance Inc.
Bytedance Inc.
ARP spoofing tool based on go language, supports LAN host scanning, ARP poisoning, man-in-the-middle attack, sensitive information sniffing, HTTP packet sniffing

[ARP Spoofing] [Usage] Commands: clear clear the screen cut 通过ARP欺骗切断局域网内某台主机的网络 exit exit the program help display help hosts 主机管理功能 loot 查看嗅探到的敏感信息

Re 41 Jun 14, 2022
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.5k Jun 30, 2022
evilginx2 is a man-in-the-middle attack framework used for phishing login credentials along with session cookies, which in turn allows to bypass 2-factor authentication protection.

evilginx2 is a man-in-the-middle attack framework used for phishing login credentials along with session cookies, which in turn allows to bypass 2-fac

null 0 Nov 4, 2021
EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptograp

American Express 492 Jun 16, 2022
set of web security test cases and a toolkit to construct new ones

Webseclab Webseclab contains a sample set of web security test cases and a toolkit to construct new ones. It can be used for testing security scanners

Yahoo 917 Jun 14, 2022
BluePhish: Open-Source Phishing Toolkit (Direct Fork of GoPhish)

BluePhish BluePhish: Open-Source Phishing Toolkit (Direct Fork of GoPhish) Gophish is an open-source phishing toolkit designed for businesses and pene

BlueStone AG 4 Jun 1, 2022
A FreeSWITCH specific scanning and exploitation toolkit for CVE-2021-37624 and CVE-2021-41157.

PewSWITCH A FreeSWITCH specific scanning and exploitation toolkit for CVE-2021-37624 and CVE-2021-41157. Related blog: https://0xinfection.github.io/p

Pinaki 23 Jun 23, 2022
Retrieve SSL certificate information

cert Retrieve SSL certificate information from provided hostname. Why I just simply want to retrieve a website's SSL certificate information in my ter

Thien Nguyen 1 Oct 5, 2021
Advanced information gathering & OSINT framework for phone numbers

PhoneInfoga is one of the most advanced tools to scan international phone numbers using only free resources. It allows you to first gather standard information such as country, area, carrier and line type on any international phone number.

Abhishek Singh Salaria 1 Oct 13, 2021
Analyse binaries for missing security features, information disclosure and more.

extrude Analyse binaries for missing security features, information disclosure and more. ?? Extrude is in the early stages of development, and current

Liam Galvin 43 Jun 16, 2022
Secure software enclave for storage of sensitive information in memory.

MemGuard Software enclave for storage of sensitive information in memory. This package attempts to reduce the likelihood of sensitive data being expos

Awn 2.2k Jun 24, 2022
Secure software enclave for storage of sensitive information in memory.

MemGuard Software enclave for storage of sensitive information in memory. This package attempts to reduce the likelihood of sensitive data being expos

Awn 2.2k Jun 23, 2022
ARP spoofing tool based on go language, supports LAN host scanning, ARP poisoning, man-in-the-middle attack, sensitive information sniffing, HTTP packet sniffing

[ARP Spoofing] [Usage] Commands: clear clear the screen cut 通过ARP欺骗切断局域网内某台主机的网络 exit exit the program help display help hosts 主机管理功能 loot 查看嗅探到的敏感信息

Re 41 Jun 14, 2022
Otx - otx tool can scrap to find sensitive information and vulnerable endpoint urls.

otx Description This tool is base on AlienVault Open Threat Exchange (OTX)? and this tool can help you to extract all the urls endpoints which can be

ShaneKhant 4 Feb 9, 2022
A Simple to use golang masking tool to mask sensitive information from go-lang data-structures

Golang Masking Tool Golang Masking Tool is a simple utility of creating a masker tool which you can use to mask sensitive information. You can use a v

Anuraag Gupta 14 Jun 15, 2022
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.5k Jun 30, 2022
CSRF protection middleware for Go.

nosurf nosurf is an HTTP package for Go that helps you prevent Cross-Site Request Forgery attacks. It acts like a middleware and therefore is compatib

Justinas Stankevičius 1.3k Jun 23, 2022
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.5k Jun 30, 2022
A high-performance, zero allocation, dynamic JSON Threat Protection in pure Go

Package gojtp provides a fast way to validate the dynamic JSON and protect against vulnerable JSON content-level attacks (JSON Threat Protection) based on configured properties.

Ankur Anand 54 Jun 15, 2022
evilginx2 is a man-in-the-middle attack framework used for phishing login credentials along with session cookies, which in turn allows to bypass 2-factor authentication protection.

evilginx2 is a man-in-the-middle attack framework used for phishing login credentials along with session cookies, which in turn allows to bypass 2-fac

null 0 Nov 4, 2021
K8s-delete-protection - Kubernetes admission controller to avoid deleteing master nodes

k8s-delete-protection Admission Controller If you want to make your Kubernetes c

null 0 Jan 17, 2022
Tpf2-tpnetmap-toolkit - A toolkit to create svg map images from TransportFever2 world data

tpf2-tpnetmap-toolkit TransportFever2 のワールドデータから svg のマップ画像を作成するツールキットです。 1. 導入方

Nosrith 1 Feb 17, 2022
EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files and more.

EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptograp

American Express 492 Jun 16, 2022
Parametrized JSON logging library in Golang which lets you obfuscate sensitive data and marshal any kind of content.

Noodlog Summary Noodlog is a Golang JSON parametrized and highly configurable logging library. It allows you to: print go structs as JSON messages; pr

Gyoza Tech 37 Apr 19, 2022
Golang JSON decoder supporting case-sensitive, number-preserving, and strict decoding use cases

Golang JSON decoder supporting case-sensitive, number-preserving, and strict decoding use cases

Kubernetes SIGs 12 Apr 3, 2022
Secure logger in Go to avoid output sensitive data in log

zlog A main distinct feature of zlog is secure logging that avoid to output secret/sensitive values to log. The feature reduce risk to store secret va

Masayoshi Mizutani 23 May 2, 2022
Drone Plugin for detecting credentials or other sensitive data in your repository

A plugin to detect hard-coded secrets and sensitive data in your source code files. Building Build the plugin binary: scripts/build.sh Build the plug

Drone by Harness 1 Apr 21, 2022
A database connection tool for sensitive data

go-sql 用于快速统计数据库行数、敏感字段匹配、数据库连接情况。 usage ./go-sql_darwin_amd64 -h ./go-sql_darwin_amd64 -f db.yaml -k name,user ./go-sql_darwin_amd64 -f db.yaml --min

null 5 Apr 4, 2022
Jsonmask use for mask sensitive data from json format

Jsonmask use for mask sensitive data from json format Installation go get -u github.com/rkritchat/jsonmask Code example package main import ( "fmt"

rkritchat 2 Mar 26, 2022