fix bug in getLargeLog() with both text and regex matches

update README.md
mikebd · Nov 3, 2023 · eecbae8 · eecbae8
1 parent a237860
commit eecbae8
Show file tree

Hide file tree

Showing 3 changed files with 90 additions and 103 deletions.
diff --git a/README.md b/README.md
@@ -34,9 +34,6 @@ Some assumptions could be considered TODO items for future enhancement
   * Hot searches could have dedicated caches that are eagerly refreshed when the file changes
 * Testing coverage should be added for the `controller` package
 * Add `GOMAXPROCS` to Dockerfile or set that in the code
-* There is a [get-large-log](https://github.com/mikebd/go-read-var-log/tree/get-large-log) branch where I hope to
-  experiment with a more efficient way to read very large files, but it is not yet complete and is not formally
-  part of this submission
 
 ## Endpoints
 
@@ -130,23 +127,23 @@ wifi.log
 ```bash
 ❯ go run ./main -p 80
 ...
-logs.go:30: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$ returned 1 lines in 2.75945789s
-logs.go:30: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$ returned 1 lines in 2.534780081s
-logs.go:30: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$ returned 1 lines in 2.148826909s
-logs.go:30: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$ returned 1 lines in 1.427393941s
-logs.go:30: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$ returned 1 lines in 1.447307855s
-logs.go:30: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$ returned 1 lines in 1.421464319s
+logs.go:31: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1 returned 1 lines using large strategy in 1.23877778s
+logs.go:31: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1 returned 1 lines using large strategy in 1.202586854s
+logs.go:31: /api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1 returned 1 lines using large strategy in 1.198800304s
+logs.go:31: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1 returned 1 lines using large strategy in 856.594448ms
+logs.go:31: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1 returned 1 lines using large strategy in 866.171423ms
+logs.go:31: /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1 returned 1 lines using large strategy in 863.319268ms
 ```
 
 ☝️ was generated by the following `curl` commands:
 
 ```bash
-❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$" 
-❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$" 
-❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$" 
-❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$" 
-❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$" 
-❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$" 
+❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1" 
+❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1" 
+❯ curl "localhost/api/v1/logs/1GB-9million.log?q=error&r=\sfecig$&n=1" 
+❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1" 
+❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1" 
+❯ curl "localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1" 
 ```
 
 The `curl` output for all of the above is:
@@ -161,128 +158,121 @@ The `curl` output for all of the above is:
 
 ### Small file
 
-`/var/log/daily.out` is a 419KB file with 6,892 lines.
+`/var/log/daily.out` is a 498KB file with 7,992 lines.
 
 ```bash
 ❯ ab -c 6 -k -n 10000 'localhost/api/v1/logs/daily.out'
 
 Document Path:          /api/v1/logs/daily.out
-Document Length:        1765 bytes
+Document Length:        1930 bytes
 
 Concurrency Level:      6
-Time taken for tests:   3.193 seconds
+Time taken for tests:   2.062 seconds
 Complete requests:      10000
 Failed requests:        0
 Keep-Alive requests:    10000
-Total transferred:      19080000 bytes
-HTML transferred:       17650000 bytes
-Requests per second:    3131.67 [#/sec] (mean)
-Time per request:       1.916 [ms] (mean)
-Time per request:       0.319 [ms] (mean, across all concurrent requests)
-Transfer rate:          5835.18 [Kbytes/sec] received
+Total transferred:      20730000 bytes
+HTML transferred:       19300000 bytes
+Requests per second:    4849.47 [#/sec] (mean)
+Time per request:       1.237 [ms] (mean)
+Time per request:       0.206 [ms] (mean, across all concurrent requests)
+Transfer rate:          9817.34 [Kbytes/sec] received
 
 Connection Times (ms)
-              min  mean[+/-sd] median   max
+min  mean[+/-sd] median   max
 Connect:        0    0   0.0      0       0
-Processing:     1    2   0.5      2       5
-Waiting:        0    2   0.5      2       5
-Total:          1    2   0.5      2       5
+Processing:     0    1   0.3      1       3
+Waiting:        0    1   0.3      1       3
+Total:          0    1   0.3      1       3
 
 Percentage of the requests served within a certain time (ms)
-  50%      2
-  66%      2
-  75%      2
-  80%      2
-  90%      3
-  95%      3
-  98%      3
-  99%      3
- 100%      5 (longest request)
+50%      1
+66%      1
+75%      1
+80%      1
+90%      2
+95%      2
+98%      2
+99%      2
+100%     3 (longest request)
 ```
 
 ### Large file with fixed text and regex matching
 
 #### Non-default GOMAXPROCS=8
 
-When run for 10,000 requests, it failed somewhere between 4,000-5,000 requests with a timeout as response times escalated to over 1 minute.
-
-For > 2,000 requests (started as 10,000 but interrupted after 2,000):
-
 ```bash
-❯ ab -c 6 -k -n 10000 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$'
+❯ ab -c 6 -k -n 1000 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1'
 
-Document Path:          /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$
+Document Path:          /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1
 Document Length:        117 bytes
 
 Concurrency Level:      6
-Time taken for tests:   1036.804 seconds
-Complete requests:      2105
+Time taken for tests:   259.356 seconds
+Complete requests:      1000
 Failed requests:        0
-Keep-Alive requests:    2105
-Total transferred:      545195 bytes
-HTML transferred:       246285 bytes
-Requests per second:    2.03 [#/sec] (mean)
-Time per request:       2955.261 [ms] (mean)
-Time per request:       492.543 [ms] (mean, across all concurrent requests)
-Transfer rate:          0.51 [Kbytes/sec] received
+Keep-Alive requests:    1000
+Total transferred:      259000 bytes
+HTML transferred:       117000 bytes
+Requests per second:    3.86 [#/sec] (mean)
+Time per request:       1556.135 [ms] (mean)
+Time per request:       259.356 [ms] (mean, across all concurrent requests)
+Transfer rate:          0.98 [Kbytes/sec] received
 
 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:        0    0   0.0      0       0
-Processing:  1905 2946 1771.9   2766   36280
-Waiting:     1905 2946 1771.9   2766   36280
-Total:       1905 2946 1771.9   2766   36280
+Processing:   909 1550  51.8   1541    1788
+Waiting:      908 1550  51.8   1540    1788
+Total:        909 1550  51.8   1541    1788
 
 Percentage of the requests served within a certain time (ms)
-  50%   2766
-  66%   2811
-  75%   2853
-  80%   2885
-  90%   3007
-  95%   3225
-  98%   4481
-  99%   6722
- 100%  36280 (longest request)
+  50%   1541
+  66%   1559
+  75%   1568
+  80%   1578
+  90%   1605
+  95%   1632
+  98%   1692
+  99%   1757
+ 100%   1788 (longest request)
 ```
 
 #### Default GOMAXPROCS=1
 
-This fails for `-n 10000` with a timeout, but "succeeds" (with poor performance) for `-n 50`.
-There is a lot of room for optimization here if this is a use case that must be supported.
-
 ```bash
-❯ ab -c 6 -k -n 50 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$'
+❯ ab -c 6 -k -n 1000 'localhost/api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1'
 
-Document Path:          /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$
+Document Path:          /api/v1/logs/1GB-9million.log?q=|error|&r=\sfecig$&n=1
 Document Length:        117 bytes
 
 Concurrency Level:      6
-Time taken for tests:   31.893 seconds
-Complete requests:      50
+Time taken for tests:   279.119 seconds
+Complete requests:      1000
 Failed requests:        0
-Keep-Alive requests:    50
-Total transferred:      12950 bytes
-HTML transferred:       5850 bytes
-Requests per second:    1.57 [#/sec] (mean)
-Time per request:       3827.187 [ms] (mean)
-Time per request:       637.865 [ms] (mean, across all concurrent requests)
-Transfer rate:          0.40 [Kbytes/sec] received
+Keep-Alive requests:    1000
+Total transferred:      259000 bytes
+HTML transferred:       117000 bytes
+Requests per second:    3.58 [#/sec] (mean)
+Time per request:       1674.712 [ms] (mean)
+Time per request:       279.119 [ms] (mean, across all concurrent requests)
+Transfer rate:          0.91 [Kbytes/sec] received
 
 Connection Times (ms)
               min  mean[+/-sd] median   max
-Connect:        0    0   0.1      0       0
-Processing:  1832 3469 1508.0   2750    6374
-Waiting:     1832 3469 1508.0   2750    6374
-Total:       1832 3469 1508.0   2750    6374
+Connect:        0    0   0.0      0       0
+Processing:  1276 1663  53.5   1654    2018
+Waiting:     1276 1663  53.5   1653    2018
+Total:       1276 1663  53.5   1654    2018
 
 Percentage of the requests served within a certain time (ms)
-  50%   2750
-  66%   3738
-  75%   5137
-  80%   5368
-  90%   5905
-  95%   6206
-  98%   6374
-  99%   6374
- 100%   6374 (longest request)
+  50%   1654
+  66%   1667
+  75%   1679
+  80%   1687
+  90%   1714
+  95%   1745
+  98%   1803
+  99%   1932
+ 100%   2018 (longest request)
 ```
diff --git a/service/getLog.go b/service/getLog.go
@@ -83,14 +83,15 @@ func getLargeLog(params *GetLogParams) GetLogResult {
 		for scanner.Scan() {
 			line := scanner.Text()
 
-			if params.matchRequested() {
-				// Cheaper tests first, short circuit more expensive tests
-				if params.textMatchRequested() && strings.Contains(line, params.TextMatch) {
-					blockResult = append(blockResult, line)
-				} else if params.regexMatchRequested() && params.Regex.MatchString(line) {
-					blockResult = append(blockResult, line)
-				}
-			} else {
+			matched := true
+			// Cheaper tests first, short circuit more expensive tests
+			if params.textMatchRequested() && !strings.Contains(line, params.TextMatch) {
+				matched = false
+			}
+			if matched && params.regexMatchRequested() && !params.Regex.MatchString(line) {
+				matched = false
+			}
+			if matched {
 				blockResult = append(blockResult, line)
 			}
 		}

diff --git a/service/getLogTypes.go b/service/getLogTypes.go
@@ -31,10 +31,6 @@ func (p *GetLogParams) regexMatchRequested() bool {
 	return p.Regex != nil
 }
 
-func (p *GetLogParams) matchRequested() bool {
-	return p.textMatchRequested() || p.regexMatchRequested()
-}
-
 type GetLogResult struct {
 	LogLines []string
 	Strategy string