<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

</head>

<body>

<div style="font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">

<span style="font-size: 14pt;"><b>"The Syntactic Acceptability Dataset as a resource for machine learning and linguistic analysis"

</b></span>

<div><br class="ContentPasted0">

</div>

<div class="ContentPasted0"><span style="font-size: 14pt;"><b>Tom Juzek</b></span></div>

<div class="ContentPasted0">Assistant Professor,</div>

<div class="ContentPasted0">Department of Modern Languages and Linguistics,</div>

<div class="ContentPasted0">Florida State University</div>

<div><br class="ContentPasted0">

</div>

<div class="ContentPasted0">NOTE: Please feel free to forward/share this invitation with other groups/disciplines that might be interested in this talk/topic.

<b>All are welcome to attend. </b></div>

<div><br class="ContentPasted0">

</div>

<div class="ContentPasted0"><b>https://fsu.zoom.us/j/94273595552 </b></div>

<div class="ContentPasted0">Meeting # <b>942 7359 5552 </b></div>

<div class="ContentPasted0"> </div>

<div><b style="color: inherit; font-family: inherit; font-size: inherit; font-style: inherit; font-variant-ligatures: inherit; font-variant-caps: inherit;">Wednesday, Oct 26</b>, 2022, Schedule:  <br>

</div>

<div><br class="ContentPasted0">

</div>

<div class="ContentPasted0">* 3:00 to 3:30 PM Eastern Time (US and Canada) </div>

<div class="ContentPasted0">Nespresso & Teatime (in 417 DSL Commons) </div>

<div><br class="ContentPasted0">

</div>

<div class="ContentPasted0">* <b>3:30 to 4:30 PM</b> Eastern Time (US and Canada)

</div>

<div class="ContentPasted0"><b>Colloquium</b> - Attend F2F (in 499 DSL) or Virtually (via Zoom)

</div>

<div><br class="ContentPasted0">

</div>

<div class="ContentPasted0"><b>Abstract</b>: </div>

<div class="ContentPasted0">Linguistic datasets are popular in machine learning, particularly in the emerging field of few shot learning (learning from limited data), as linguistic data is often complex and difficult to generalize from, and thus a welcome challenge

 (Wang et al. 2020). In this talk, I will outline ongoing research on building a new dataset valuable to both the machine learning community and the linguistic community. The new dataset will be based on COLA (Corpus of Linguistic Acceptability; Warstadt et

 al. 2018), a popular dataset in machine learning. I will briefly introduce COLA, the challenges it poses, and relevant linguistic distinctions (acceptability vs grammaticality). Further, I will motivate the need for new data, a different kind of data, outline

 its structure, and its expected relevance to machine learning and linguistics.</div>

<br>

</div>

</body>

</html>