A library for reading PST files (written in Go/Golang).
Introduction
go-pst is a library for reading PST files (written in Go/Golang).
The PFF (Personal Folder File) and OFF (Offline Folder File) format is used to store Microsoft Outlook e-mails, appointments and contacts. The PST (Personal Storage Table), OST (Offline Storage Table) and PAB (Personal Address Book) file format consist of the PFF format.
The following offsets start from the (node/block) b-tree offset.
64-bit
Offset
Size
Description
0
488
B-tree node entries (number of entries x entry size).
488
1
The number of entries.
490
1
The size of an entry.
491
1
B-tree node level. A zero value represents a leaf node. A value greater than zero represents a branch node, with the highest level representing the root.
64-bit 4k
Offset
Size
Description
0
4056
B-tree node entries (number of entries x entry size).
4056
2
The number of entries.
4060
1
The size of an entry.
4061
1
B-tree node level. A zero value represents a leaf node. A value greater than zero represents a branch node, with the highest level representing the root.
32-bit
Offset
Size
Description
0
496
B-tree node entries (number of entries x entry size).
496
1
The number of entries.
498
1
The size of an entry.
499
1
B-tree node level. A zero value represents a leaf node. A value greater than zero represents a branch node, with the highest level representing the root.
Allocation table. This contains Allocation count + 1 entries. Each entry is an int (16 bit) value that is the byte offset to the beginning of the allocation. The start of this offset can be retrieved by using page map offset + (2 * hidIndex) + 2 (page map offset plus the start of the allocation table, at the HID index offset). An extra entry exists at the Allocation count +1 position to mark the offset of the next available slot.
Table types
Table type
Description
Features
108
6c table
Has a b5 table header.
124
7c table
Has a b5 table header.
140
8c table
Has a b5 table header
156
9c table
Has a b5 table header.
165
a5 table
172
ac table
Has a b5 table header.
181
b5 table header
B-Tree on Heap
188
bc table
Property Context (PC/BTH). Has a b5 table header.
204
cc table
Unknown
B-Tree-on-Heap
B-Tree-on-Heap header
All tables should have a BTree-on-Heap header at HID0x20 (the start offset to the BTree-on-Heap header is in the allocation table). This is the HID User Root from the Heap-on-Node header.
Offset
Size
Description
0
1
Table type. MUST be 188.
1
1
Size of the BTree Key value. MUST be 2, 4, 8 or 16.
2
1
Size of the data value. MUST be greater than zero and less than or equal to 32.
3
1
Index depth.
4
4
HID root. Points to the B-Tree-on-Heap entries (offset can be found in the allocation table).
Property Context
The property context starts at the HID root of the B-Tree-on-Heap header.
go-pst v4 will require Go 1.18 due to generics support.
Benchmarks (which will be published along with v4) show initializing the b-trees uses a bit more but searching for folders/messages uses less CPU and Memory.
Would you consider changing the License to something here? I would kindly suggest the MIT license.
pkg.go.dev won't even show documentation for this package since you're using WTFPL, and I imagine it's going to limit it's use in some projects. I'd love to add PST support to my own eml2html package.
Thanks for the work in this package and considering this request.
Edit: I noticed the badge on the README actually says MIT, but that doesn't match what's in the actual LICENSE.txt file. Maybe this is just an oversight?
go-pst should expose file offsets to messages and be able to load Message structs from this offset.
This would bypass the b-tree and is useful for indexing.
go-pst currently uses the following to search for messages:
It would be better to use the following so we don't have to traverse the whole tree to find the next message:
We currently return an error when attempting to create an iterator while there are no messages/attachments available. We should return an empty iterator instead.
This is my iterator implementation flaw, but it would have been nice to have a standard iterator in Go:
https://github.com/golang/go/discussions/54245
https://github.com/golang/go/discussions/56413
go-pst currently reads the PST file(s) sequentially.
It would be nice to use a Goroutine pool to walk/read structures such as the b-trees/Property Context in parallel.
email-verifier โ๏ธ A Go library for email verification without sending any emails. Features Email Address Validation: validates if a string contains a
email-verifier โ๏ธ A Go library for email verification without sending any emails. Features Email Address Validation: validates if a string contains a