Một script nho nhỏ viết bằng Go để crawl toàn bộ điểm thi THPTQG-2021

Overview

Crawl toàn bộ điểm thi THPTQG-2021

Một script nho nhỏ viết bằng Go để crawl toàn bộ điểm thi THPTQG-2021, mình đã crawl sẵn toàn bộ ở đây:
https://drive.google.com/drive/u/0/folders/1IGb3n_ieBlsfOtND2nTvsB7nlFoYURXO
Trong thư mục có 64 file tương ứng với 64 tỉnh thành, ngoài ra còn có file total gộp lại từ 64 file trên (vì một lý do tà thuật nào đấy mà mình không thể up file total với đuôi .csv lên Google Drive được nên bạn nào cần thì có thể tải xuống rồi đổi đuôi file nhé)

Dữ liệu mẫu:

Yêu cầu

  1. Đã tải source về
  2. Đã cài đặt Go

Cách dùng

Tải Dependencies

go mod vendor

Chạy chương trình

go run .

hoặc bạn cũng có thể build ra binary

go build .
./crawlscore

Tuỳ chỉnh tham số

Một số tham số bạn có thể thay đổi trong file .env

PATCH_SIZE=100
PATCH_DELAY=0.1
OUTPUT_FOLDER=data
TOTAL_FILENAME=total.csv
  • Vì nguồn data để crawl có cơ chế chống DOS, do đó để hạn chế bị chặn thì mình đã cho chương trình chạy theo cơ chế crawl lần lượt theo từng patch với PATCH_SIZE là độ lớn của từng patch, PATCH_DELAY là thời gian chờ để crawl patch tiếp theo.
  • OUTPUT_FOLDER là tên thư mục mà dữ liệu được xuất ra.
  • TOTAL_FILENAME là tên của file tổng hợp tất cả dữ liệu từ 64 tỉnh thành.

Một số lưu ý nhỏ

  1. Chỉ có TP.HCM (Mã tỉnh thành 02) là có đầy đủ họ tên, ngày tháng năm sinh, giới tính (do chỉ có TP.HCM cung cấp).
  2. Chỉ nên để giá trị PATCH_SIZE trong khoảng từ 100 đến 200, nhưng để ổn định nhất thì mình khuyên chỉ nên để 100 thôi, nó sẽ chạy xong sau một giấc ngủ trưa.
  3. Mình không đảm bảo thuật toán tìm SBD của mình là hoàn toàn chính xác, nên có bất kì sai sót gì thì có thể báo cho mình nhé. Cụ thể thì có bạn phát hiện ở mã tỉnh 35 dữ liệu mình crawl không đủ, mình đã crawl bằng tay và cập nhật lại ở link Google Drive.
  4. Trong quá trình chạy các bạn có thể bị chặn, cách duy nhất có lẽ là phải đổi địa chỉ IP thôi. Mình dùng WARP của Cloudflare, mỗi lần bị chặn thì chỉ cần vô Preferences -> Connection -> Reset Encryption Keys là được.
You might also like...
Advent of Code 2021 https://adventofcode.com/2021

AOC 2021 How to use Not sure yet. Maybe cd into your day folder and go run, or maybe better to try go test Dev environment Open in VSCode, enable (Rem

A FreeSWITCH specific scanning and exploitation toolkit for CVE-2021-37624 and CVE-2021-41157.

PewSWITCH A FreeSWITCH specific scanning and exploitation toolkit for CVE-2021-37624 and CVE-2021-41157. Related blog: https://0xinfection.github.io/p

Poc-cve-2021-4034 - PoC for CVE-2021-4034 dubbed pwnkit

poc-cve-2021-4034 PoC for CVE-2021-4034 dubbed pwnkit Compile exploit.go go buil

CVE-2021-4034 - A Golang implementation of clubby789's implementation of CVE-2021-4034

CVE-2021-4034 January 25, 2022 | An00bRektn This is a golang implementation of C

Gentee - script programming language for automation. It uses VM and compiler written in Go (Golang).

Gentee script programming language Gentee is a free open source script programming language. The Gentee programming language is designed to create scr

A fast script language for Go
A fast script language for Go

The Tengo Language Tengo is a small, dynamic, fast, secure script language for Go. Tengo is fast and secure because it's compiled/executed as bytecode

Golang bindings of Sciter: the Embeddable HTML/CSS/script engine for modern UI development
Golang bindings of Sciter: the Embeddable HTML/CSS/script engine for modern UI development

Go bindings for Sciter Check this page for other language bindings (Delphi / D / Go / .NET / Python / Rust). Attention The ownership of project is tra

Shell script to download and set GO environmental paths to allow multiple versions.
Shell script to download and set GO environmental paths to allow multiple versions.

gobrew gobrew lets you easily switch between multiple versions of go. It is based on rbenv and pyenv. Installation The automatic installer You can ins

Gentee - script programming language for automation. It uses VM and compiler written in Go (Golang).

Gentee script programming language Gentee is a free open source script programming language. The Gentee programming language is designed to create scr

A simple Go script to brute force or parse a password-protected PKCS#12 (PFX/P12) file.
A simple Go script to brute force or parse a password-protected PKCS#12 (PFX/P12) file.

A simple Go script to brute force or parse a password-protected PKCS#12 (PFX/P12) file.

View the script files in the original Resident Evil 2 / Biohazard 2 as pseudocode
View the script files in the original Resident Evil 2 / Biohazard 2 as pseudocode

Resident Evil 2 Script Viewer About You can view the script files in the original Resident Evil 2 / Biohazard 2 as pseudocode next to the original byt

流媒体NetFlix解锁检测脚本 / A script used to determine whether your network can watch native Netflix movies or not
流媒体NetFlix解锁检测脚本 / A script used to determine whether your network can watch native Netflix movies or not

netflix-verify 流媒体NetFlix解锁检测脚本,使用Go语言编写 在VPS网络正常的情况下,哪怕是双栈网络也可在几秒内快速完成IPv4/IPv6的解锁判断 鸣谢 感谢 @CoiaPrant 指出对于地域检测更简便的方法 感谢 @XmJwit 解决了IPV6 Only VPS无法下载脚

Simple unpacking script for Ezuri ELF Crypter
Simple unpacking script for Ezuri ELF Crypter

ezuri_unpack A simple unpacking script for the Ezuri ELF Crypter. Based on the analysis done by Ofer Caspi and Fernando Martinez of AT&T Alien Labs

MySQL Monitor Script

README.md Introduction mymon(MySQL-Monitor) 是Open-Falcon用来监控MySQL数据库运行状态的一个插件,采集包括global status, global variables, slave status以及innodb status等MySQL运行

A fast script language for Go
A fast script language for Go

The Tengo Language Tengo is a small, dynamic, fast, secure script language for Go. Tengo is fast and secure because it's compiled/executed as bytecode

A pure Unix shell script implementing ACME client protocol

An ACME Shell script: acme.sh An ACME protocol client written purely in Shell (Unix shell) language. Full ACME protocol implementation. Support ACME v

Serpscan is a powerfull php script designed to allow you to leverage the power of dorking straight from the comfort of your command line.
Serpscan is a powerfull php script designed to allow you to leverage the power of dorking straight from the comfort of your command line.

SerpScan Serpscan is a powerful PHP tool designed to allow you to leverage the power of dorking straight from the comfort of your command line. Table

A simple script to run speedtest(offical) CLI tool and store the results in CSV

PeriodicBW A script made to run official speedtest.net binary periodically and store the results in a CSV file Installation Get the official speedtest

Script to check open slot for 18+ age group in particular district and pin code area

Running instruction install go: 1.14 (might work with other versions as well) install following library for sending notifications go get -u github.com

Owner
null
golang script to check server & port status

netcheck Simple script to check if host alive by sending ICMP messages & TCP Port checks. ICMP messages not working without sudo privileges. usage net

Vyacheslav Gerasimov 3 Sep 2, 2022
Simple script fro DNS upload testing. Written in Go.

BENCH DNS Simple script fro DNS upload testing. Written in Go. Features Using local domain list file with -file argument Download from URL and then us

/:) 1 Dec 6, 2021
Simple installation script for grpc-gateway

Grpc-Gateway-installation Grpc-gateway is a plugin of Google protocol buffer compiler, it reads the definitions from .proto file and generates a rever

null 0 Dec 16, 2021
UFW-Autoblacklist - Script that allow you to ban-ip all spammers

Setup: go build main.go tcptrack -i <interface> | ./main

0хVιcнy#1337 13 Nov 1, 2022
Nhat Tran 0 Feb 10, 2022
System Design course at HSE (2021)

System Design course at HSE (2021) Wiki-страница курса Структура репозитория: slides - директория с презентациями с занятий tasks - материалы для выпо

null 21 Oct 9, 2022
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

go_spider A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014). QQ群号:337344607 Features Concurrent

胡聪 1.8k Nov 14, 2022
sigurls is a reconnaissance tool, it fetches URLs from AlienVault's OTX, Common Crawl, URLScan, Github and the Wayback Machine.

sigurls is a reconnaissance tool, it fetches URLs from AlienVault's OTX, Common Crawl, URLScan, Github and the Wayback Machine. DiSCLAIMER: fe

Alex Munene 128 May 22, 2021
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

go_spider A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014). QQ群号:337344607 Features Concurrent

胡聪 1.8k Nov 14, 2022
vRealize RCE + Privesc (CVE-2021-21975, CVE-2021-21983, CVE-0DAY-?????)

REALITY_SMASHER vRealize RCE + Privesc (CVE-2021-21975, CVE-2021-21983, CVE-0DAY-?????) "As easy to stop as it is to comprehend." What is it? "Reality

rabid 36 Nov 9, 2022