juanfra218
/

text2sql

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

juanfra218 commited on Aug 2, 2024

Commit

6d446a9

·

verified ·

1 Parent(s): c9e0730

Update README.md

Files changed (1) hide show

README.md +61 -3

README.md CHANGED Viewed

@@ -1,3 +1,61 @@
----
-license: mit
----

+---
+license: mit
+---
+# Fine-Tuned Google T5 Model for Text to SQL Translation
+This repository contains a fine-tuned version of the Google T5 model, specifically trained for the task of translating natural language queries into SQL statements.
+## Model Details
+- **Architecture**: Google T5 (Text-to-Text Transfer Transformer)
+- **Task**: Text to SQL Translation
+- **Fine-Tuning Datasets**:
+  - [sql-create-context Dataset](https://huggingface.co/datasets/b-mc2/sql-create-context)
+  - [Synthetic-Text-To-SQL Dataset](https://huggingface.co/datasets/gretelai/synthetic-text-to-sql)
+## Fine-Tuning Datasets
+1. **sql-create-context Dataset**:
+   - This dataset was created by modifying data from the following sources:
+     - Zhong, Victor, et al. "Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning." (2017).
+     - Yu, Tao, et al. "Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task." (2018).
+   - Citation:
+     ```bibtex
+     @misc{b-mc2_2023_sql-create-context,
+       title   = {sql-create-context Dataset},
+       author  = {b-mc2},
+       year    = {2023},
+       url     = {https://huggingface.co/datasets/b-mc2/sql-create-context},
+       note    = {This dataset was created by modifying data from the following sources: \cite{zhongSeq2SQL2017, yu2018spider}.},
+     }
+     ```
+2. **Synthetic-Text-To-SQL Dataset**:
+   - A synthetic dataset for training language models to generate SQL queries from natural language prompts.
+   - Citation:
+     ```bibtex
+     @software{gretel-synthetic-text-to-sql-2024,
+       author = {Meyer, Yev and Emadi, Marjan and Nathawani, Dhruv and Ramaswamy, Lipika and Boyd, Kendrick and Van Segbroeck, Maarten and Grossman, Matthew and Mlocek, Piotr and Newberry, Drew},
+       title = {{Synthetic-Text-To-SQL}: A synthetic dataset for training language models to generate SQL queries from natural language prompts},
+       month = {April},
+       year = {2024},
+       url = {https://huggingface.co/datasets/gretelai/synthetic-text-to-sql}
+     }
+     ```
+## Ongoing Work
+I am currently working to implement PICARD (Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models) to improve the results of this model. More details can be found in the original PICARD paper:
+- Citation:
+  ```bibtex
+  @misc{scholak2021picardparsingincrementallyconstrained,
+        title={PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models},
+        author={Torsten Scholak and Nathan Schucher and Dzmitry Bahdanau},
+        year={2021},
+        eprint={2109.05093},
+        archivePrefix={arXiv},
+        primaryClass={cs.CL},
+        url={https://arxiv.org/abs/2109.05093},
+  }