Skip to content

Commit

Permalink
Implement strategy for reading large files
Browse files Browse the repository at this point in the history
  • Loading branch information
mikebd committed Nov 3, 2023
1 parent 52bec7a commit ae1187c
Show file tree
Hide file tree
Showing 10 changed files with 448 additions and 150 deletions.
170 changes: 82 additions & 88 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ are naturally in `/var/log`.
| Argument | Description |
|------------------|----------------------------------------|
| `-h` \| `--help` | Print the help message |
| `-l` | Large File Bytes (default 10,000,000) |
| `-lt` | Log with timestamps (UTC) |
| `-n` | Number of lines to return (default 25) |
| `-p` | HTTP Port to listen on (default 8080) |
Expand Down Expand Up @@ -129,23 +130,23 @@ wifi.log
```bash
❯ go run ./main -p 80
...
logs.go:30: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$ returned 1 lines in 2.75945789s
logs.go:30: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$ returned 1 lines in 2.534780081s
logs.go:30: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$ returned 1 lines in 2.148826909s
logs.go:30: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$ returned 1 lines in 1.427393941s
logs.go:30: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$ returned 1 lines in 1.447307855s
logs.go:30: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$ returned 1 lines in 1.421464319s
logs.go:31: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1 returned 1 lines using large strategy in 1.23877778s
logs.go:31: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1 returned 1 lines using large strategy in 1.202586854s
logs.go:31: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1 returned 1 lines using large strategy in 1.198800304s
logs.go:31: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1 returned 1 lines using large strategy in 856.594448ms
logs.go:31: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1 returned 1 lines using large strategy in 866.171423ms
logs.go:31: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1 returned 1 lines using large strategy in 863.319268ms
```

☝️ was generated by the following `curl` commands:

```bash
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1"
❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1"
```

The `curl` output for all of the above is:
Expand All @@ -160,128 +161,121 @@ The `curl` output for all of the above is:

### Small file

`/var/log/daily.out` is a 419KB file with 6,892 lines.
`/var/log/daily.out` is a 498KB file with 7,992 lines.

```bash
❯ ab -c 6 -k -n 10000 'localhost/api/v1/logs/daily.out'

Document Path: /api/v1/logs/daily.out
Document Length: 1765 bytes
Document Length: 1930 bytes

Concurrency Level: 6
Time taken for tests: 3.193 seconds
Time taken for tests: 2.062 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 19080000 bytes
HTML transferred: 17650000 bytes
Requests per second: 3131.67 [#/sec] (mean)
Time per request: 1.916 [ms] (mean)
Time per request: 0.319 [ms] (mean, across all concurrent requests)
Transfer rate: 5835.18 [Kbytes/sec] received
Total transferred: 20730000 bytes
HTML transferred: 19300000 bytes
Requests per second: 4849.47 [#/sec] (mean)
Time per request: 1.237 [ms] (mean)
Time per request: 0.206 [ms] (mean, across all concurrent requests)
Transfer rate: 9817.34 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1 2 0.5 2 5
Waiting: 0 2 0.5 2 5
Total: 1 2 0.5 2 5
Processing: 0 1 0.3 1 3
Waiting: 0 1 0.3 1 3
Total: 0 1 0.3 1 3

Percentage of the requests served within a certain time (ms)
50% 2
66% 2
75% 2
80% 2
90% 3
95% 3
98% 3
99% 3
100% 5 (longest request)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 2
98% 2
99% 2
100% 3 (longest request)
```
### Large file with fixed text and regex matching
#### Non-default GOMAXPROCS=8
When run for 10,000 requests, it failed somewhere between 4,000-5,000 requests with a timeout as response times escalated to over 1 minute.
For > 2,000 requests (started as 10,000 but interrupted after 2,000):
```bash
❯ ab -c 6 -k -n 10000 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$'
❯ ab -c 6 -k -n 1000 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1'

Document Path: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$
Document Path: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1
Document Length: 117 bytes

Concurrency Level: 6
Time taken for tests: 1036.804 seconds
Complete requests: 2105
Time taken for tests: 259.356 seconds
Complete requests: 1000
Failed requests: 0
Keep-Alive requests: 2105
Total transferred: 545195 bytes
HTML transferred: 246285 bytes
Requests per second: 2.03 [#/sec] (mean)
Time per request: 2955.261 [ms] (mean)
Time per request: 492.543 [ms] (mean, across all concurrent requests)
Transfer rate: 0.51 [Kbytes/sec] received
Keep-Alive requests: 1000
Total transferred: 259000 bytes
HTML transferred: 117000 bytes
Requests per second: 3.86 [#/sec] (mean)
Time per request: 1556.135 [ms] (mean)
Time per request: 259.356 [ms] (mean, across all concurrent requests)
Transfer rate: 0.98 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1905 2946 1771.9 2766 36280
Waiting: 1905 2946 1771.9 2766 36280
Total: 1905 2946 1771.9 2766 36280
Processing: 909 1550 51.8 1541 1788
Waiting: 908 1550 51.8 1540 1788
Total: 909 1550 51.8 1541 1788

Percentage of the requests served within a certain time (ms)
50% 2766
66% 2811
75% 2853
80% 2885
90% 3007
95% 3225
98% 4481
99% 6722
100% 36280 (longest request)
50% 1541
66% 1559
75% 1568
80% 1578
90% 1605
95% 1632
98% 1692
99% 1757
100% 1788 (longest request)
```
#### Default GOMAXPROCS=1
This fails for `-n 10000` with a timeout, but "succeeds" (with poor performance) for `-n 50`.
There is a lot of room for optimization here if this is a use case that must be supported.
```bash
❯ ab -c 6 -k -n 50 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$'
❯ ab -c 6 -k -n 1000 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1'

Document Path: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$
Document Path: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1
Document Length: 117 bytes

Concurrency Level: 6
Time taken for tests: 31.893 seconds
Complete requests: 50
Time taken for tests: 279.119 seconds
Complete requests: 1000
Failed requests: 0
Keep-Alive requests: 50
Total transferred: 12950 bytes
HTML transferred: 5850 bytes
Requests per second: 1.57 [#/sec] (mean)
Time per request: 3827.187 [ms] (mean)
Time per request: 637.865 [ms] (mean, across all concurrent requests)
Transfer rate: 0.40 [Kbytes/sec] received
Keep-Alive requests: 1000
Total transferred: 259000 bytes
HTML transferred: 117000 bytes
Requests per second: 3.58 [#/sec] (mean)
Time per request: 1674.712 [ms] (mean)
Time per request: 279.119 [ms] (mean, across all concurrent requests)
Transfer rate: 0.91 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 0
Processing: 1832 3469 1508.0 2750 6374
Waiting: 1832 3469 1508.0 2750 6374
Total: 1832 3469 1508.0 2750 6374
Connect: 0 0 0.0 0 0
Processing: 1276 1663 53.5 1654 2018
Waiting: 1276 1663 53.5 1653 2018
Total: 1276 1663 53.5 1654 2018

Percentage of the requests served within a certain time (ms)
50% 2750
66% 3738
75% 5137
80% 5368
90% 5905
95% 6206
98% 6374
99% 6374
100% 6374 (longest request)
50% 1654
66% 1667
75% 1679
80% 1687
90% 1714
95% 1745
98% 1803
99% 1932
100% 2018 (longest request)
```
2 changes: 2 additions & 0 deletions config/arguments.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (

type Arguments struct {
HttpPort int
LargeFileBytes int64
LogTimestamps bool
NumberOfLogLines int
}
Expand All @@ -15,6 +16,7 @@ func ParseArguments() Arguments {
configure(&args)

flag.IntVar(&args.HttpPort, "p", HttpPort, "HTTP port")
flag.Int64Var(&args.LargeFileBytes, "l", LargeFileBytes, "Large file bytes")
flag.BoolVar(&args.LogTimestamps, "lt", false, "Log timestamps (UTC)")
flag.IntVar(&args.NumberOfLogLines, "n", NumberOfLogLines, "Number of log lines to return")
flag.Parse()
Expand Down
3 changes: 3 additions & 0 deletions config/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@ package config

const (
HttpPort = 8080
LargeFileBytes = 10_000_000
LargeFileSeekBytes = 1_000_000
LogDirectory = "/var/log"
MaxResultLines = 2_000
NumberOfLogLines = 25
UnsupportedFileTypes = "gz|zip|bz2|tar|tgz|7z|rar"
)
13 changes: 11 additions & 2 deletions controller/rest/v1/logs.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ func GetLogs(w http.ResponseWriter, r *http.Request, _ httprouter.Params) {
func GetLog(w http.ResponseWriter, r *http.Request, p httprouter.Params) {
start := time.Now()
linesReturned := 0
strategy := "unknown"
defer func() {
log.Println(r.URL.RequestURI(), "returned", linesReturned, "lines in", time.Since(start))
log.Println(r.URL.RequestURI(), "returned", linesReturned, "lines using", strategy, "strategy in", time.Since(start))
}()

logFilename := p.ByName("log")
Expand All @@ -46,7 +47,15 @@ func GetLog(w http.ResponseWriter, r *http.Request, p httprouter.Params) {

maxLines, err := util.PositiveIntParamStrict(w, r, config.GetArguments().NumberOfLogLines, "n")
if err == nil {
logEvents, err := service.GetLog(config.LogDirectory, logFilename, textMatch, regex, maxLines)
getLogResult := service.GetLog(&service.GetLogParams{
DirectoryPath: config.LogDirectory,
Filename: logFilename,
TextMatch: textMatch,
Regex: regex,
MaxLines: maxLines,
})
logEvents, err := getLogResult.LogLines, getLogResult.Err
strategy = getLogResult.Strategy

if err == nil {
linesReturned = len(logEvents)
Expand Down
Loading

0 comments on commit ae1187c

Please sign in to comment.