Seems to me like wishful thinking based on tightly-circular reasoning: You are presuming what you want to conclude.
The subjective states of minds are inherently unknowable. The only way, therefore, that the "understanding" of an AI can be assessed is by its correlates with our own understanding. This is the foundation of the Turing test, after all: We have to ask the AI what it understands. Ask it whether the apple was paid for or stolen and it will quite likely give the correct answer, once it has these concepts. Ask it about concepts that it does not have then of course it cannot infer them, any more than a member of a tribe without a concept of property could infer theft.
A human mind is just as incapable of inferring tacit or pragmatic meaning if it is deprived of the specific tacit or pragmatic meaning required for the particular case. This, therefore, is not a difference in nature but one of lack of experience. And the internet is full of information that an AI could use to provide experience. These systems can already combine images and text. Adding correlates for sound, touch and smell is not a new problem, but a generalisation and application of one that has already been solved.
And if that is not enough for you, there will soon enough be robots with hearing, touch and smell to connect these experiences with the physical world.