Master Databricks Certified Generative AI Engineer Associate: Your Gateway to AI Excellence
A Generative Al Engineer at an automotive company would like to build a question-answering chatbot for customers to inquire about their vehicles. They have a database containing various documents of different vehicle makes, their hardware parts, and common maintenance information.
Which of the following components will NOT be useful in building such a chatbot?
Correct : B
The task involves building a question-answering chatbot for an automotive company using a database of vehicle-related documents. The chatbot must efficiently process customer inquiries and provide accurate responses. Let's evaluate each component to determine which is not useful, per Databricks Generative AI Engineer principles.
Option A: Response-generating LLM
An LLM is essential for generating natural language responses to customer queries based on retrieved information. This is a core component of any chatbot.
Databricks Reference: 'The response-generating LLM processes retrieved context to produce coherent answers' ('Building LLM Applications with Databricks,' 2023).
Option B: Invite users to submit long, rather than concise, questions
Encouraging long questions is a user interaction design choice, not a technical component of the chatbot's architecture. Moreover, long, verbose questions can complicate intent detection and retrieval, reducing efficiency and accuracy---counter to best practices for chatbot design. Concise questions are typically preferred for clarity and performance.
Databricks Reference: While not explicitly stated, Databricks' 'Generative AI Cookbook' emphasizes efficient query processing, implying that simpler, focused inputs improve LLM performance. Inviting long questions doesn't align with this.
Option C: Vector database
A vector database stores embeddings of the vehicle documents, enabling fast retrieval of relevant information via semantic search. This is critical for a question-answering system with a large document corpus.
Databricks Reference: 'Vector databases enable scalable retrieval of context from large datasets' ('Databricks Generative AI Engineer Guide').
Option D: Embedding model
An embedding model converts text (documents and queries) into vector representations for similarity search. It's a foundational component for retrieval-augmented generation (RAG) in chatbots.
Databricks Reference: 'Embedding models transform text into vectors, facilitating efficient matching of queries to documents' ('Building LLM-Powered Applications').
Conclusion: Option B is not a useful component in building the chatbot. It's a user-facing suggestion rather than a technical building block, and it could even degrade performance by introducing unnecessary complexity. Options A, C, and D are all integral to a Databricks-aligned chatbot architecture.
Start a Discussions
A Generative Al Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM's on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric.
Which metric should they choose for this evaluation?
Correct : B
The task is to benchmark LLMs for text translation using an evaluation set with known high-quality examples, requiring a performant metric. Let's evaluate the options.
Option A: ROUGE metric
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures overlap between generated and reference texts, primarily for summarization. It's less suited for translation, where precision and word order matter more.
Databricks Reference: 'ROUGE is commonly used for summarization, not translation evaluation' ('Generative AI Cookbook,' 2023).
Option B: BLEU metric
BLEU (Bilingual Evaluation Understudy) evaluates translation quality by comparing n-gram overlap with reference translations, accounting for precision and brevity. It's widely used, performant, and appropriate for this task.
Databricks Reference: 'BLEU is a standard metric for evaluating machine translation, balancing accuracy and efficiency' ('Building LLM Applications with Databricks').
Option C: NDCG metric
NDCG (Normalized Discounted Cumulative Gain) assesses ranking quality, not text generation. It's irrelevant for translation evaluation.
Databricks Reference: 'NDCG is suited for ranking tasks, not generative output scoring' ('Databricks Generative AI Engineer Guide').
Option D: RECALL metric
Recall measures retrieved relevant items but doesn't evaluate translation quality (e.g., fluency, correctness). It's incomplete for this use case.
Databricks Reference: No specific extract, but recall alone lacks the granularity of BLEU for text generation tasks.
Conclusion: Option B (BLEU) is the best metric for translation evaluation, offering a performant and standard approach, as endorsed by Databricks' guidance on generative tasks.
Start a Discussions
A Generative Al Engineer wants their (inetuned LLMs in their prod Databncks workspace available for testing in their dev workspace as well. All of their workspaces are Unity Catalog enabled and they are currently logging their models into the Model Registry in MLflow.
What is the most cost-effective and secure option for the Generative Al Engineer to accomplish their gAi?
Correct : D
The goal is to make fine-tuned LLMs from a production (prod) Databricks workspace available for testing in a development (dev) workspace, leveraging Unity Catalog and MLflow, while ensuring cost-effectiveness and security. Let's analyze the options.
Option A: Use an external model registry which can be accessed from all workspaces
An external registry adds cost (e.g., hosting fees) and complexity (e.g., integration, security configurations) outside Databricks' native ecosystem, reducing security compared to Unity Catalog's governance.
Databricks Reference: 'Unity Catalog provides a centralized, secure model registry within Databricks' ('Unity Catalog Documentation,' 2023).
Option B: Setup a script to export the model from prod and import it to dev
Export/import scripts require manual effort, storage for model artifacts, and repeated execution, increasing operational cost and risk (e.g., version mismatches, unsecured transfers). It's less efficient than a native solution.
Databricks Reference: Manual processes are discouraged when Unity Catalog offers built-in sharing: 'Avoid redundant workflows with Unity Catalog's cross-workspace access' ('MLflow with Unity Catalog').
Option C: Setup a duplicate training pipeline in dev, so that an identical model is available in dev
Duplicating the training pipeline doubles compute and storage costs, as it retrains the model from scratch. It's neither cost-effective nor necessary when the prod model can be reused securely.
Databricks Reference: 'Re-running training is resource-intensive; leverage existing models where possible' ('Generative AI Engineer Guide').
Option D: Use MLflow to log the model directly into Unity Catalog, and enable READ access in the dev workspace to the model
Unity Catalog, integrated with MLflow, allows models logged in prod to be centrally managed and accessed across workspaces with fine-grained permissions (e.g., READ for dev). This is cost-effective (no extra infrastructure or retraining) and secure (governed by Databricks' access controls).
Databricks Reference: 'Log models to Unity Catalog via MLflow, then grant access to other workspaces securely' ('MLflow Model Registry with Unity Catalog,' 2023).
Conclusion: Option D leverages Databricks' native tools (MLflow and Unity Catalog) for a seamless, cost-effective, and secure solution, avoiding external systems, manual scripts, or redundant training.
Start a Discussions
A Generative Al Engineer is using an LLM to classify species of edible mushrooms based on text descriptions of certain features. The model is returning accurate responses in testing and the Generative Al Engineer is confident they have the correct list of possible labels, but the output frequently contains additional reasoning in the answer when the Generative Al Engineer only wants to return the label with no additional text.
Which action should they take to elicit the desired behavior from this LLM?
Correct : D
The LLM classifies mushroom species accurately but includes unwanted reasoning text, and the engineer wants only the label. Let's assess how to control output format effectively.
Option A: Use few shot prompting to instruct the model on expected output format
Few-shot prompting provides examples (e.g., input: description, output: label). It can work but requires crafting multiple examples, which is effort-intensive and less direct than a clear instruction.
Databricks Reference: 'Few-shot prompting guides LLMs via examples, effective for format control but requires careful design' ('Generative AI Cookbook').
Option B: Use zero shot prompting to instruct the model on expected output format
Zero-shot prompting relies on a single instruction (e.g., ''Return only the label'') without examples. It's simpler than few-shot but may not consistently enforce succinctness if the LLM's default behavior is verbose.
Databricks Reference: 'Zero-shot prompting can specify output but may lack precision without examples' ('Building LLM Applications with Databricks').
Option C: Use zero shot chain-of-thought prompting to prevent a verbose output format
Chain-of-Thought (CoT) encourages step-by-step reasoning, which increases verbosity---opposite to the desired outcome. This contradicts the goal of label-only output.
Databricks Reference: 'CoT prompting enhances reasoning but often results in detailed responses' ('Databricks Generative AI Engineer Guide').
Option D: Use a system prompt to instruct the model to be succinct in its answer
A system prompt (e.g., ''Respond with only the species label, no additional text'') sets a global instruction for the LLM's behavior. It's direct, reusable, and effective for controlling output style across queries.
Databricks Reference: 'System prompts define LLM behavior consistently, ideal for enforcing concise outputs' ('Generative AI Cookbook,' 2023).
Conclusion: Option D is the most effective and straightforward action, using a system prompt to enforce succinct, label-only responses, aligning with Databricks' best practices for output control.
Start a Discussions
A Generative Al Engineer is building an LLM-based application that has an
important transcription (speech-to-text) task. Speed is essential for the success of the application
Which open Generative Al models should be used?
Correct : D
The task requires an open generative AI model for a transcription (speech-to-text) task where speed is essential. Let's assess the options based on their suitability for transcription and performance characteristics, referencing Databricks' approach to model selection.
Option A: Llama-2-70b-chat-hf
Llama-2 is a text-based LLM optimized for chat and text generation, not speech-to-text. It lacks transcription capabilities.
Databricks Reference: 'Llama models are designed for natural language generation, not audio processing' ('Databricks Model Catalog').
Option B: MPT-30B-Instruct
MPT-30B is another text-based LLM focused on instruction-following and text generation, not transcription. It's irrelevant for speech-to-text tasks.
Databricks Reference: No specific mention, but MPT is categorized under text LLMs in Databricks' ecosystem, not audio models.
Option C: DBRX
DBRX, developed by Databricks, is a powerful text-based LLM for general-purpose generation. It doesn't natively support speech-to-text and isn't optimized for transcription.
Databricks Reference: 'DBRX excels at text generation and reasoning tasks' ('Introducing DBRX,' 2023)---no mention of audio capabilities.
Option D: whisper-large-v3 (1.6B)
Whisper, developed by OpenAI, is an open-source model specifically designed for speech-to-text transcription. The ''large-v3'' variant (1.6 billion parameters) balances accuracy and efficiency, with optimizations for speed via quantization or deployment on GPUs---key for the application's requirements.
Databricks Reference: 'For audio transcription, models like Whisper are recommended for their speed and accuracy' ('Generative AI Cookbook,' 2023). Databricks supports Whisper integration in its MLflow or Lakehouse workflows.
Conclusion: Only D. whisper-large-v3 is a speech-to-text model, making it the sole suitable choice. Its design prioritizes transcription, and its efficiency (e.g., via optimized inference) meets the speed requirement, aligning with Databricks' model deployment best practices.
Start a Discussions
Total 73 questions