Catchall 'encoding needs to get un-crap-ified' ticket

## Background

This is a follow-on from #289 and also encompasses some or all of #241, #274, #262, #332, #242, #321, and #338. My goal is to make this general story better, ASAP, while leaving remaining work clearly defined (regardless of who ends up taking a swing at it).

I need to doublecheck some of those linked issues that I've not spent a lot of time on, but the bulk of discussion is in #289 and #274, much of it driven by @pfmoore who (as a Python 3 on Windows user) serves as a very competent & useful counterweight to myself (a default-Python-2 on Unix developer). Any solution that works for both of us is likely one that serves most users.

The basics that fall out of what's been said there so far:
- Where & how bytes/strings are decoded/encoded, needs to be better defined/documented/contracted;
- Encoding/decoding should be pushed as far out as possible, to the "edges" of the code, to limit how often it happens & becomes an issue;
  - But also centralized further - the code handling it needs to be clearly labeled & no faffing about should happen outside that one area.
- The default behavior should encompass as many use cases as is reasonable to do, and should degrade as gracefully as it can otherwise (re: exceptions, vs replacements, vs ???);
  - But users should also be given the ability to work around that default cleanly & fully whenever it doesn't function for their use case.
  - Ideally, we'll balance these two things well instead of using "the user can twiddle the knobs we provided" as an excuse to let the default behavior suffer. But this is hard and time is short, so...
## TODO
- [x] Triple check existing locations where we explicitly or implicitly encode/decode, and move them all somewhere centralized & obvious (without otherwise altering them).
  - Right now the only explicit mentions of `(encode|decode)\(` are a bare handful in `runner.py`, so there's likely others that are implicit instead.
  - Not sure what the "most obvious" place is, but given most of this is directly tied to `run()` and friends, it may still just live in the `Runner` class or `runner.py` generally.
- [x] Identify & implement additional tests (integration or regular) for use cases not represented by the `funky_characters_in_stdout` integration tests
  - Those tests should have caught some of the basic e.g. `cat tree -> kaboom` bugs that snuck in during 0.11.x, but did not.
  - I identified the basic reason why & have updated it so they fail again, in a branch.
  - But they only represent a single slice of the problem - look at the discussion in #274, as well as the other top level issues above, and see what other tests can be added that currently fail.
  - Ideally at least some of these would go in the regular, not integration, suite.
- [x] Get those tests passing, either based off of #274 or with something new
- [x] If not implied in above: probably makes sense to add more encoding config toggles (maybe up to "one for each stream") so users can override them granularly; the single `encoding` is nice and all, but won't work in cases where e.g. one's invoke-invoking terminal is Encoding A but the environment the subprocess is operating in is using Encoding B.
  - Made this #353
- [x] Handle, or surface into this or other tickets, any TODOs added in the branch left undone
- [ ] See if @pfmoore has time for immediate feedback to those changes (sadly I wasn't able to give him a thumbs up for his offer of making more PRs on his end, back in Feb) just in case.
- [x] Re: comments below, update `read_proc_output` so we always expect bytes:
  - [x] Enhance docstring(s) to make it clear that any file-like objects given explicitly, need to yield bytes
  - [x] Remove the "if" statement
  - [x] Update our test suites to use `BytesIO` instead of `StringIO` (& any related string literal changes, or maybe a wrapper if lazy).
- [x] Update changelog entry for any fully-closed tickets & their authors


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Catchall 'encoding needs to get un-crap-ified' ticket #350

Background

TODO

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Catchall 'encoding needs to get un-crap-ified' ticket #350

Description

Background

TODO

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions