Jim Myhrberg 6a2254e918 feat(performance): improve core undenting performance by around 20-30x
Previously we relied heavily on regexp to filter out and grab all
indentation white space, and then to strip away indentation shared
across all lines. This was reasonably fast. However I wanted to see if I
could make it faster by manually iterating over the input. Turns out
doing so makes is around 20 times faster.

The code is a lot more complicated though, but I'll attempt to break it
down. There's three main phases to it:

1. Iterate over every character of the input to locate all
   line-feed (\n) characters, storing their indexes in a integer slice.
2. Iterate over the list of life-feed indexes, and for each line-feed,
   scan forward until a non-whitespace character is found, counting how
   many whitespace characters we encountered directly after the
   life-feed. If the number is lower than our previously lowest number
   of leading whitespace characters, store that as the new lowest
   number.
3. Now that we know the lowest number of leading whitespace characters
   common across every line of the input, we can iterate over the list
   of life-feed indexes again. This time to build the final output, but
   reading all characters from the life-feed index + whitespace number,
   until the next life-feed character, or end of input.

Overall this approach yields a 15-20x speed improvement over the old
method.

Benchmarks, before:

    goos: darwin
    goarch: amd64
    pkg: github.com/jimeh/undent
    cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
    BenchmarkBytes/empty-8          78280611                15.18 ns/op
    BenchmarkBytes/single-line-8     2361297               515.1 ns/op
    BenchmarkBytes/single-line_indented-8             317440              3618 ns/op
    BenchmarkBytes/multi-line-8                       630370              1920 ns/op
    BenchmarkBytes/multi-line_space_indented-8        156266              7664 ns/op
    BenchmarkBytes/multi-line_space_indented_without_any_leading_line-breaks-8                155672              8168 ns/op
    BenchmarkBytes/multi-line_space_indented_with_leading_line-breaks-8                       144655              8165 ns/op
    BenchmarkBytes/multi-line_tab_indented-8                                                  206425              5462 ns/op
    BenchmarkBytes/multi-line_tab_indented_without_any_leading_line-breaks-8                  223620              5542 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_leading_line-breaks-8                         208132              5857 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8                199480              5687 ns/op
    BenchmarkBytes/multi-line_space_indented_with_blank_lines-8                               148402              8072 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_blank_lines-8                                 200929              5691 ns/op
    BenchmarkBytes/multi-line_space_indented_with_random_indentation-8                        197412              6515 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_random_indentation-8                          281493              4272 ns/op
    BenchmarkBytes/long_block_of_text-8                                                         9894            115752 ns/op
    BenchmarkString/empty-8                                                                 100000000               12.75 ns/op
    BenchmarkString/single-line-8                                                            2224165               529.0 ns/op
    BenchmarkString/single-line_indented-8                                                    314088              3784 ns/op
    BenchmarkString/multi-line-8                                                              645804              1968 ns/op
    BenchmarkString/multi-line_space_indented-8                                               149310              8103 ns/op
    BenchmarkString/multi-line_space_indented_without_any_leading_line-breaks-8               145390              8496 ns/op
    BenchmarkString/multi-line_space_indented_with_leading_line-breaks-8                      145579              8161 ns/op
    BenchmarkString/multi-line_tab_indented-8                                                 223596              5487 ns/op
    BenchmarkString/multi-line_tab_indented_without_any_leading_line-breaks-8                 214842              5641 ns/op
    BenchmarkString/multi-line_tab_indented_with_leading_line-breaks-8                        209067              5685 ns/op
    BenchmarkString/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8               210307              5584 ns/op
    BenchmarkString/multi-line_space_indented_with_blank_lines-8                              133948              9280 ns/op
    BenchmarkString/multi-line_tab_indented_with_blank_lines-8                                178296              5769 ns/op
    BenchmarkString/multi-line_space_indented_with_random_indentation-8                       206030              6222 ns/op
    BenchmarkString/multi-line_tab_indented_with_random_indentation-8                         236450              4259 ns/op
    BenchmarkString/long_block_of_text-8                                                       10000            113065 ns/op
    PASS
    ok      github.com/jimeh/undent 44.800s

Benchmarks, after:

    goos: darwin
    goarch: amd64
    pkg: github.com/jimeh/undent
    cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
    BenchmarkBytes/empty-8          596493562                2.074 ns/op
    BenchmarkBytes/single-line-8    20044598                60.64 ns/op
    BenchmarkBytes/single-line_indented-8           12449749                84.43 ns/op
    BenchmarkBytes/multi-line-8                      5086376               232.3 ns/op
    BenchmarkBytes/multi-line_space_indented-8       3077774               400.4 ns/op
    BenchmarkBytes/multi-line_space_indented_without_any_leading_line-breaks-8               3011881               386.6 ns/op
    BenchmarkBytes/multi-line_space_indented_with_leading_line-breaks-8                      3034299               402.9 ns/op
    BenchmarkBytes/multi-line_tab_indented-8                                                 4500271               266.2 ns/op
    BenchmarkBytes/multi-line_tab_indented_without_any_leading_line-breaks-8                 4355886               277.5 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_leading_line-breaks-8                        3758012               289.5 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8               4425787               271.9 ns/op
    BenchmarkBytes/multi-line_space_indented_with_blank_lines-8                              3035809               412.2 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_blank_lines-8                                3771512               334.2 ns/op
    BenchmarkBytes/multi-line_space_indented_with_random_indentation-8                       4461404               275.6 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_random_indentation-8                         6960343               174.6 ns/op
    BenchmarkBytes/long_block_of_text-8                                                       315788              3776 ns/op
    BenchmarkString/empty-8                                                                 338024905                3.761 ns/op
    BenchmarkString/single-line-8                                                           20067831                59.28 ns/op
    BenchmarkString/single-line_indented-8                                                  13826002                88.16 ns/op
    BenchmarkString/multi-line-8                                                             4451938               261.6 ns/op
    BenchmarkString/multi-line_space_indented-8                                              2911797               411.1 ns/op
    BenchmarkString/multi-line_space_indented_without_any_leading_line-breaks-8              2699631               416.5 ns/op
    BenchmarkString/multi-line_space_indented_with_leading_line-breaks-8                     2737174               436.3 ns/op
    BenchmarkString/multi-line_tab_indented-8                                                4208000               304.6 ns/op
    BenchmarkString/multi-line_tab_indented_without_any_leading_line-breaks-8                4029422               295.8 ns/op
    BenchmarkString/multi-line_tab_indented_with_leading_line-breaks-8                       3929960               310.3 ns/op
    BenchmarkString/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8              3978992               292.5 ns/op
    BenchmarkString/multi-line_space_indented_with_blank_lines-8                             2829766               428.5 ns/op
    BenchmarkString/multi-line_tab_indented_with_blank_lines-8                               3788185               304.8 ns/op
    BenchmarkString/multi-line_space_indented_with_random_indentation-8                      4104337               279.4 ns/op
    BenchmarkString/multi-line_tab_indented_with_random_indentation-8                        7092417               177.4 ns/op
    BenchmarkString/long_block_of_text-8                                                      283140              4398 ns/op
    PASS
    ok      github.com/jimeh/undent 47.252s
2021-02-22 22:42:27 +00:00
2020-12-14 14:55:43 +00:00
2021-02-21 04:03:48 +00:00

undent

Go package which removes leading indentation/white-space from strings.

Go Reference GitHub tag (latest SemVer) Actions Status Coverage GitHub issues GitHub pull requests License Status

s := undent.String(`
	{
	  "hello": "world"
	}`,
)
fmt.Println(s)
{
  "hello": "world"
}

Documentation

Please see the Go Reference for documentation and examples.

Benchmarks

Benchmark reports and graphs are available here: https://jimeh.me/undent/dev/bench/

License

MIT

Description
Go package which removes leading indentation/white-space from strings.
Readme MIT 299 KiB
Languages
Go 89.6%
Makefile 10.4%