Commit Graph

8 Commits

Author SHA1 Message Date
84715b90d0 fix(perf): minor improvement when no indent is found 2024-10-21 01:43:49 +01:00
178d9b5bf6 perf: simplify line-feed and minimum indentation loops
This is based on performance improvements suggested by OpenAI's ChatGPT.
2022-12-04 23:49:34 +00:00
5a4b199462 docs(godoc): fix typo in doc string for Print function 2021-02-22 22:51:46 +00:00
6a2254e918 feat(performance): improve core undenting performance by around 20-30x
Previously we relied heavily on regexp to filter out and grab all
indentation white space, and then to strip away indentation shared
across all lines. This was reasonably fast. However I wanted to see if I
could make it faster by manually iterating over the input. Turns out
doing so makes is around 20 times faster.

The code is a lot more complicated though, but I'll attempt to break it
down. There's three main phases to it:

1. Iterate over every character of the input to locate all
   line-feed (\n) characters, storing their indexes in a integer slice.
2. Iterate over the list of life-feed indexes, and for each line-feed,
   scan forward until a non-whitespace character is found, counting how
   many whitespace characters we encountered directly after the
   life-feed. If the number is lower than our previously lowest number
   of leading whitespace characters, store that as the new lowest
   number.
3. Now that we know the lowest number of leading whitespace characters
   common across every line of the input, we can iterate over the list
   of life-feed indexes again. This time to build the final output, but
   reading all characters from the life-feed index + whitespace number,
   until the next life-feed character, or end of input.

Overall this approach yields a 15-20x speed improvement over the old
method.

Benchmarks, before:

    goos: darwin
    goarch: amd64
    pkg: github.com/jimeh/undent
    cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
    BenchmarkBytes/empty-8          78280611                15.18 ns/op
    BenchmarkBytes/single-line-8     2361297               515.1 ns/op
    BenchmarkBytes/single-line_indented-8             317440              3618 ns/op
    BenchmarkBytes/multi-line-8                       630370              1920 ns/op
    BenchmarkBytes/multi-line_space_indented-8        156266              7664 ns/op
    BenchmarkBytes/multi-line_space_indented_without_any_leading_line-breaks-8                155672              8168 ns/op
    BenchmarkBytes/multi-line_space_indented_with_leading_line-breaks-8                       144655              8165 ns/op
    BenchmarkBytes/multi-line_tab_indented-8                                                  206425              5462 ns/op
    BenchmarkBytes/multi-line_tab_indented_without_any_leading_line-breaks-8                  223620              5542 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_leading_line-breaks-8                         208132              5857 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8                199480              5687 ns/op
    BenchmarkBytes/multi-line_space_indented_with_blank_lines-8                               148402              8072 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_blank_lines-8                                 200929              5691 ns/op
    BenchmarkBytes/multi-line_space_indented_with_random_indentation-8                        197412              6515 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_random_indentation-8                          281493              4272 ns/op
    BenchmarkBytes/long_block_of_text-8                                                         9894            115752 ns/op
    BenchmarkString/empty-8                                                                 100000000               12.75 ns/op
    BenchmarkString/single-line-8                                                            2224165               529.0 ns/op
    BenchmarkString/single-line_indented-8                                                    314088              3784 ns/op
    BenchmarkString/multi-line-8                                                              645804              1968 ns/op
    BenchmarkString/multi-line_space_indented-8                                               149310              8103 ns/op
    BenchmarkString/multi-line_space_indented_without_any_leading_line-breaks-8               145390              8496 ns/op
    BenchmarkString/multi-line_space_indented_with_leading_line-breaks-8                      145579              8161 ns/op
    BenchmarkString/multi-line_tab_indented-8                                                 223596              5487 ns/op
    BenchmarkString/multi-line_tab_indented_without_any_leading_line-breaks-8                 214842              5641 ns/op
    BenchmarkString/multi-line_tab_indented_with_leading_line-breaks-8                        209067              5685 ns/op
    BenchmarkString/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8               210307              5584 ns/op
    BenchmarkString/multi-line_space_indented_with_blank_lines-8                              133948              9280 ns/op
    BenchmarkString/multi-line_tab_indented_with_blank_lines-8                                178296              5769 ns/op
    BenchmarkString/multi-line_space_indented_with_random_indentation-8                       206030              6222 ns/op
    BenchmarkString/multi-line_tab_indented_with_random_indentation-8                         236450              4259 ns/op
    BenchmarkString/long_block_of_text-8                                                       10000            113065 ns/op
    PASS
    ok      github.com/jimeh/undent 44.800s

Benchmarks, after:

    goos: darwin
    goarch: amd64
    pkg: github.com/jimeh/undent
    cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
    BenchmarkBytes/empty-8          596493562                2.074 ns/op
    BenchmarkBytes/single-line-8    20044598                60.64 ns/op
    BenchmarkBytes/single-line_indented-8           12449749                84.43 ns/op
    BenchmarkBytes/multi-line-8                      5086376               232.3 ns/op
    BenchmarkBytes/multi-line_space_indented-8       3077774               400.4 ns/op
    BenchmarkBytes/multi-line_space_indented_without_any_leading_line-breaks-8               3011881               386.6 ns/op
    BenchmarkBytes/multi-line_space_indented_with_leading_line-breaks-8                      3034299               402.9 ns/op
    BenchmarkBytes/multi-line_tab_indented-8                                                 4500271               266.2 ns/op
    BenchmarkBytes/multi-line_tab_indented_without_any_leading_line-breaks-8                 4355886               277.5 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_leading_line-breaks-8                        3758012               289.5 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8               4425787               271.9 ns/op
    BenchmarkBytes/multi-line_space_indented_with_blank_lines-8                              3035809               412.2 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_blank_lines-8                                3771512               334.2 ns/op
    BenchmarkBytes/multi-line_space_indented_with_random_indentation-8                       4461404               275.6 ns/op
    BenchmarkBytes/multi-line_tab_indented_with_random_indentation-8                         6960343               174.6 ns/op
    BenchmarkBytes/long_block_of_text-8                                                       315788              3776 ns/op
    BenchmarkString/empty-8                                                                 338024905                3.761 ns/op
    BenchmarkString/single-line-8                                                           20067831                59.28 ns/op
    BenchmarkString/single-line_indented-8                                                  13826002                88.16 ns/op
    BenchmarkString/multi-line-8                                                             4451938               261.6 ns/op
    BenchmarkString/multi-line_space_indented-8                                              2911797               411.1 ns/op
    BenchmarkString/multi-line_space_indented_without_any_leading_line-breaks-8              2699631               416.5 ns/op
    BenchmarkString/multi-line_space_indented_with_leading_line-breaks-8                     2737174               436.3 ns/op
    BenchmarkString/multi-line_tab_indented-8                                                4208000               304.6 ns/op
    BenchmarkString/multi-line_tab_indented_without_any_leading_line-breaks-8                4029422               295.8 ns/op
    BenchmarkString/multi-line_tab_indented_with_leading_line-breaks-8                       3929960               310.3 ns/op
    BenchmarkString/multi-line_tab_indented_with_tabs_and_spaces_after_indent-8              3978992               292.5 ns/op
    BenchmarkString/multi-line_space_indented_with_blank_lines-8                             2829766               428.5 ns/op
    BenchmarkString/multi-line_tab_indented_with_blank_lines-8                               3788185               304.8 ns/op
    BenchmarkString/multi-line_space_indented_with_random_indentation-8                      4104337               279.4 ns/op
    BenchmarkString/multi-line_tab_indented_with_random_indentation-8                        7092417               177.4 ns/op
    BenchmarkString/long_block_of_text-8                                                      283140              4398 ns/op
    PASS
    ok      github.com/jimeh/undent 47.252s
2021-02-22 22:42:27 +00:00
5cae4bc420 feat(print) add Print, Printf, Fprint, and Fprintf functions 2021-02-20 22:09:49 +00:00
5dbdbbf341 fix(bytes): change Bytes function to accept string input but return a byte slice
The old method signature was just nonsensical, as you would always be
providing indented values via a string literal. So it makes much more
sense to have all methods accept a string argument, and then return
different types.

This also allows use of a `Bytesf` method.

This is technically a breaking change, but I'm classifying it as a
bugfix cause the old method signature was basically useless.
2020-12-14 14:52:32 +00:00
b2057429a1 fix(whitespace): remove leading line-break from input
This effectively cleans up what I consider syntactical sugar required
due to Go's syntax. For example:

    str := undent.String(`
        hello
        world`,
    )

In the above example I would consider the initial line-break after the
opening back-tick (`) character syntactical sugar, and hence should be
discarded from the final undented string.

However if the literal string contains more than one initial line-break,
only the first one should be removed, as the rest would intentionally be
part of the input.
2020-12-07 10:43:26 +00:00
6cdaf8a476 feat(undent): add initial implementation of String, Stringf, and Bytes 2020-11-26 03:02:09 +00:00