Glass onion: Compositional text-to-image generation using diffusion models and LLMs