Stats protip -- probability theory becomes a lot less mysterious and arcane if you separate out the theory (i.e. the axiomatic mathematical structure) from the philosophy (i.e. how the mathematical structure is applied to model actual systems).
Despite its reputation the mathematics of probability theory is actually quite straightforward - it's just the basic rules for how to consistently distribute or allocate a finite amount of stuff across a space. Probabilities are just the relative proportions of that allocation.
On spaces with a finite number of elements -- even a countable number of elements -- the rules for this allocation are particularly simple. On continuous spaces with an uncountably infinite number of elements the math becomes more complicated but the concepts are the same!
Notice that "frequency" nor "uncertainty" nor "degrees of belief" make any appearance in this definition. None of those concepts are fundamental to the _mathematical_ structure of probability theory. Instead those are all _applications_ of probability theory.
Want to model the frequency of a "events" that might manifest in some system? Use probability theory to distribute some number of "trials" to those events. Some receive more trials then others but all trials have to result in some event. The relative proportions are frequencies!
You have some uncertainty about the behavior of a system? If you can quantify the possible behaviors as a space then we can allocate "certainty" across that space to give relative degrees of belief/information/insert your favorite term.
Oh, but we're not done yet. What if the thing you want to distribute is an actual conserved physical quantity like mass or charge? That's right it's probability theory that tells us how to distribute that quantity across the physical object.
But, you might say, <supposed expert on frequentist statistics> said that probabilities are _derived_ from long term frequencies! Or the <supposed expert on Bayesian inference> said that probabilities are _derived_ from the Cox axioms for degrees of belief!
People have been making these claims for _centuries_ and unfortunately none of them quite hold up. Even for the simplest systems sequences of long term frequencies don't all converge to the same frequency without excising some "pathological" sequences.
The problem is defining those pathological sequences without resorting to probability theory and hence introducing circular logic. There are some interesting attempts to do this using computation theory but let's just say that it's far from straightforward.
Kolmogorov didn't derive his axioms from asymptotic frequencies -- he just found math that was consistent with what people expected of them. Adopting frequentist terminology for those axioms and claiming that they could be applied to only frequencies went far beyond the math its
(its -> itself). On the other side what's wrong with the Cox axioms that define criteria that any mathematical quantification for "common sense" should satisfy? They are equivalent to probability theory on finite spaces but not on continuous spaces.
There have been various attempts to extend the Cox axioms with additional criteria that are equivalent to the Kolmogorov axioms of probability theory, but none of them have the "common sense" intuition of the base Cox axioms.
We can avoid all of these problems by just letting probability theory be a foundational theory for the self-consistent allocation of a conserved quantity. Although this theory isn't fully specified by frequencies/ uncertainties/etc it is _consistent_ with those concepts.
Consequently once we've defined probability theory we can use it to consistently model those concepts. Indeed there's no reason why we can't use probability theory to model multiple concepts _at the same time_.
I can't speak for all Bayesians (after all Good made it clear there are already 46,656 varieties of Bayesians many decades ago) but for me the appeal of Bayesian methods is that they don't restrict the application of probability theory.
Frequencies, uncertainty, and physical distribution all fall into the same, unified mathematical framework. From this perspective Bayesian inference isn't mutually exclusive with frequentist inference -- it's a _pure generalization_ of frequentist inference.
Just think of a Bayesian analysis of the common coin flipping example -- we have the bias of the coin, the frequencies of flip outcomes, and uncertainty about that bias. These are three distinct notions of probability all interwoven with each other!
No wonder people are overwhelmed by statistics! Importantly, however, none of that difficulty is arising from the mathematics of probability theory itself. It's all in the application of probability theory.
So let's do everyone trying to learn statistics a favor and separate out the universal mathematics of probability theory from the heterogeneous philosophies for how we can apply probability theory and be honest about the limitations of those philosophies.
Oh, and for more check out my chapter on probability theory, betanalpha.github.io/assets/case_st…
, especially Section 6. This is due for a big update but the basics are still good!